Approximation Algorithms (or โMy Problem is NP-Complete โ Now What?โ)
Many combinatorial problems that arise in practice are NP-complete, meaning that, under widely believed assumptions, they cannot be solved exactly in polynomial time. Nevertheless, these problems are ubiquitous in real-world applications, and we often need practical solutions.
One approach, particularly for optimization problems, is to compute approximate solutions efficiently, that is, solutions that are close to optimal, though not necessarily exactly optimal. Approximation algorithms provide such solutions in polynomial time. While the solution may not be perfect, it is often sufficiently good for practical purposes.
Load Balancing
Consider the following classical optimization problem:
-
We have n jobs, each with a processing time
for . -
These jobs must be executed on m machines.
-
Let
denote the load of machine , i.e., the sum of the processing times of the jobs assigned to it:
where
The objective is to minimize the maximum load among all machines:
Here,
Important Constraints
-
Each job must be assigned completely to a single machine; jobs cannot be split across multiple machines. These are known as unsplittable jobs.
-
The makespan,
, represents the total time until all jobs are completed if each machine executes its assigned jobs in some order starting from time . Minimizing the makespan ensures the earliest possible completion time.
Informally, good load balancing also ensures fairness if the machines are operated by human workers rather than automated systems.
Complexity
Unfortunately, Load Balancing is NP-complete, even when
Thus, finding the exact optimal assignment is computationally infeasible for large instances. Instead, we turn to approximation algorithms that run in polynomial time and produce solutions that are provably close to optimal.
Approximation Notation
We denote the optimal makespan for a given instance as
Greedy Algorithm for Load Balancing
A natural idea for an algorithm is greedy assignment:
Pass through the jobs one by one and assign each job to the machine that currently has the smallest load.
Due to the NP-completeness of the problem, this algorithm cannot always produce the optimal solution, and in fact, there exist small counterexamples.
Example
Consider
-
The optimal solution has makespan
, obtained by assigning jobs and to one machine, and to the other. -
In general, it may not be possible to achieve a perfectly balanced load of
on every machine.
Despite this, the greedy algorithm may perform reasonably well in practice. But how can we analyze its performance rigorously?
From Intuition to Worst-Case Analysis
To evaluate the quality of the greedy algorithm, we need a precise notion of performance. A natural question is:
How far can the greedy solution be from an optimal solution in the worst case, considering all possible instances?
Worst-case analysis provides reliability guarantees, just as worst-case time complexity provides guarantees on the running time of an algorithm.
However, we still need a measure of distance between solutions.
A simple difference,
making the difference unbounded and unsuitable for analysis.
Approximation Ratio
A more robust measure is the relative error, or approximation ratio:
-
Ideally, we want this ratio to be as close to
as possible. -
Analyzing the exact ratio
for every instance is usually too complex, as it depends on many small details of each instance.
Instead, we aim to find upper bounds on
Upper and Lower Bounds
Even though
-
Finding a lower bound on
(based on properties of the problem, independent of any algorithm). -
Finding an upper bound on
(based on the output of the specific algorithm).
Then we can assert:
โ ๏ธ Important: The situation is not symmetric. The lower bound relates to optimal solutions (a property of the problem), while the upper bound relates to the algorithm's output (a property of the algorithm).
Lower Bound Analysis
We start by analyzing the optimal makespan
Two obvious lower bounds follow directly from the problem:
- The largest job length:
- The average load per machine:
While these bounds are correct, the averaging bound can be too optimistic. For example, if there are many small jobs and one very large job, dividing the total load by
Upper Bound Analysis
Any upper bound on
What specifically causes the output to be โbadโ?
To answer this, we examine the last job that finishes:
-
Let
be the machine that finishes last, so . -
Let
be the last job on machine .
โ ๏ธ Note: Job
is not necessarily the last job in the list of jobs, because jobs may be scheduled on other machines afterward without increasing the makespan.
A schematic representation:
i |----|--|---|---|-----j-------|
Why was job assigned to machine ?
Job
By the greedy rule, all other machines
If we sum the loads of all machines, we get:
Why? Because each machine
Dividing both sides by
Since also
Thus, the approximation ratio of the greedy algorithm satisfies:
This is a remarkably simple result: using only elementary reasoning, we have proven that the greedy algorithm never produces a makespan more than twice the optimal.
Tightness of the Bound
The factor
It turns out the analysis is essentially tight. There exist instances where
-
Many short jobs, which the greedy algorithm assigns in a balanced way.
-
Followed by one very long job, which must be assigned to a single machine, destroying the balance.
With the appropriate number and lengths of short jobs, the approximation ratio can approach
โIโm so proud: I have finished my puzzle in only 2 years.
On the package it says 3โ4 years.โ