Approximation Algorithms (or โ€œMy Problem is NP-Complete โ€” Now What?โ€)

Many combinatorial problems that arise in practice are NP-complete, meaning that, under widely believed assumptions, they cannot be solved exactly in polynomial time. Nevertheless, these problems are ubiquitous in real-world applications, and we often need practical solutions.

One approach, particularly for optimization problems, is to compute approximate solutions efficiently, that is, solutions that are close to optimal, though not necessarily exactly optimal. Approximation algorithms provide such solutions in polynomial time. While the solution may not be perfect, it is often sufficiently good for practical purposes.

Load Balancing

Consider the following classical optimization problem:

Ti=โˆ‘jโˆˆJitj,

where Ji is the set of jobs assigned to machine i.

The objective is to minimize the maximum load among all machines:

T:=maxi=1,โ€ฆ,mTi.

Here, tj and m are given, and the Ti values depend on the assignment of jobs to machines.

Important Constraints

  1. Each job must be assigned completely to a single machine; jobs cannot be split across multiple machines. These are known as unsplittable jobs.

  2. The makespan, T, represents the total time until all jobs are completed if each machine executes its assigned jobs in some order starting from time 0. Minimizing the makespan ensures the earliest possible completion time.

Informally, good load balancing also ensures fairness if the machines are operated by human workers rather than automated systems.

Complexity

Unfortunately, Load Balancing is NP-complete, even when m=2 machines. The problem closely resembles the Subset Sum problem, and NP-completeness can be formally shown via a polynomial-time reduction from Subset Sum.

Thus, finding the exact optimal assignment is computationally infeasible for large instances. Instead, we turn to approximation algorithms that run in polynomial time and produce solutions that are provably close to optimal.

Approximation Notation

We denote the optimal makespan for a given instance as Tโˆ—. This โ€œstarโ€ notation is standard in optimization problems. In general, our algorithms will produce a makespan T such that

Tโ‰ˆTโˆ—

Greedy Algorithm for Load Balancing

A natural idea for an algorithm is greedy assignment:

Pass through the jobs one by one and assign each job to the machine that currently has the smallest load.

Due to the NP-completeness of the problem, this algorithm cannot always produce the optimal solution, and in fact, there exist small counterexamples.

Example

Consider m=2 machines and n=5 jobs with processing times (3,3,2,2,2).

Despite this, the greedy algorithm may perform reasonably well in practice. But how can we analyze its performance rigorously?

From Intuition to Worst-Case Analysis

To evaluate the quality of the greedy algorithm, we need a precise notion of performance. A natural question is:

How far can the greedy solution be from an optimal solution in the worst case, considering all possible instances?

Worst-case analysis provides reliability guarantees, just as worst-case time complexity provides guarantees on the running time of an algorithm.

However, we still need a measure of distance between solutions.

A simple difference, Tโˆ’Tโˆ—, is not a good measure. Why? Consider scaling an instance by a factor c (multiplying all job times by c). Then the makespan is also scaled by c, but the problem itself is unchanged. This implies that

maxall instances(Tโˆ’Tโˆ—)=โˆž,

making the difference unbounded and unsuitable for analysis.

Approximation Ratio

A more robust measure is the relative error, or approximation ratio:

Approximation ratio:=TTโˆ—.

Instead, we aim to find upper bounds on T/Tโˆ—.

Upper and Lower Bounds

Even though Tโˆ— is generally unknown, we can still bound T/Tโˆ— by:

  1. Finding a lower bound on Tโˆ— (based on properties of the problem, independent of any algorithm).

  2. Finding an upper bound on T (based on the output of the specific algorithm).

Then we can assert:

TTโˆ—โ‰คupper bound onย Tlower bound onย Tโˆ—.

โš ๏ธ Important: The situation is not symmetric. The lower bound relates to optimal solutions (a property of the problem), while the upper bound relates to the algorithm's output (a property of the algorithm).

Lower Bound Analysis

We start by analyzing the optimal makespan Tโˆ—. This is the easier part, as it depends only on properties of the problem, not on the algorithm.

Two obvious lower bounds follow directly from the problem:

  1. The largest job length:
Tโˆ—โ‰ฅmaxj=1,โ€ฆ,ntj
  1. The average load per machine:
Tโˆ—โ‰ฅ1mโˆ‘j=1ntj

While these bounds are correct, the averaging bound can be too optimistic. For example, if there are many small jobs and one very large job, dividing the total load by m underestimates the minimal possible makespan.

Upper Bound Analysis

Any upper bound on T must account for the behavior of the algorithm. A natural question is:

What specifically causes the output to be โ€œbadโ€?

To answer this, we examine the last job that finishes:

โš ๏ธ Note: Job j is not necessarily the last job in the list of jobs, because jobs may be scheduled on other machines afterward without increasing the makespan.

A schematic representation:

i |----|--|---|---|-----j-------|
Why was job j assigned to machine i?

Job j was assigned to machine i because machine i had the smallest load at that moment. Let the load of i before assigning j be Tโˆ’tj.

By the greedy rule, all other machines k must satisfy:

Tkโ‰ฅTโˆ’tjfor allย k=1,โ€ฆ,m

If we sum the loads of all machines, we get:

โˆ‘k=1mTkโ‰ฅm(Tโˆ’tj)

Why? Because each machine k contributes at least Tโˆ’tj, and there are m machines.

Dividing both sides by m and using the averaging lower bound gives:

Tโˆ’tjโ‰ค1mโˆ‘k=1mTkโ‰คTโˆ—

Since also Tโˆ—โ‰ฅtj, we finally obtain:

Tโ‰คTโˆ—+tjโ‰ค2Tโˆ—

Thus, the approximation ratio of the greedy algorithm satisfies:

TTโˆ—โ‰ค2

This is a remarkably simple result: using only elementary reasoning, we have proven that the greedy algorithm never produces a makespan more than twice the optimal.

Tightness of the Bound

The factor 2 may seem disappointing, the solution could be twice as long as optimal. Can the algorithm be improved? Or is it actually better than this analysis suggests?

It turns out the analysis is essentially tight. There exist instances where T is arbitrarily close to 2Tโˆ—. A typical โ€œnastyโ€ instance consists of:

With the appropriate number and lengths of short jobs, the approximation ratio can approach 2. The precise construction is skipped here, but the key idea is simple: the last, long job dominates the makespan.


โ€œIโ€™m so proud: I have finished my puzzle in only 2 years.
On the package it says 3โ€“4 years.โ€