We can observe that the Load Balancing problem with unsplittable jobs (which we know is NP-complete) is a strictly special case of this extended problem.
For example, we can consider an instance of the extended problem where the set of splittable jobs is empty. In this scenario, the input consists solely of unsplittable jobs, that is the original Load Balancing problem. Any algorithm that can solve the extended problem must also be able to solve this specific instance. Since finding the optimal makespan for an instance with only unsplittable jobs is NP-complete, the general problem containing both types must be at least as hard. Therefore, extended Load Balancing is NP-hard.
Let's prove that the problem also belongs to the class NP. To show this, we frame it as a decision problem: given a time threshold , does a schedule exist such that ? A proposed solution (certificate) is the full assignment of unsplittable jobs and the "flow" allocation for splittable jobs.
We can also verify the makespan for each machine by computing its total load as the sum of unsplittable processing times and the allocated fractions of splittable jobs. Since all calculations (sums and checking ) can be performed in polynomial time relative to the number of jobs and machines , the extended Load Balancing problem is in NP.
Since the problem is both NP-hard and in NP, the extended Load Balancing problem is NP-complete.
Exercise 1.2
We can extend the smarter greedy algorithm that first sorts and re-indexes the jobs such that , and then applies the original greedy algorithm, to handle instances that may also contain splittable jobs.
First, we separate the jobs into unsplittable and splittable sets. Let the set of unsplittable jobs be with processing times and the set of splittable jobs be with processing times . We then order the unsplittable jobs in descending order () and assign each job greedily to the machine with minimum current load , exactly as in the original algorithm.
Let denote the load of machine after all unsplittable jobs, and let be the intermediate makespan.
Next, for the splittable jobs, we calculate the total splittable work . We then compute the total average load for the entire instance: . The splittable work is then used to "fill" any machine whose load is less than , raising its load up to . If a machine already has , it receives no splittable work.
The final makespan is therefore the maximum of the original highest unsplittable load and this average load.
Now we have to prove that .
First, we know the guarantee for the unsplittable part: , where is the optimal makespan for an instance with only the jobs in .
(since the optimum for the full problem cannot be better than the optimum for a sub-problem) and (since the optimum cannot be better than the perfect average).
We can simply check the two cases for our final makespan :
If the ratio holds, since the makespan is determined by an unsplittable load. We have .
If , the makespan is determined by the average load. We have . In this case, our solution is optimal.
In both cases the final makespan , so the approximation ratio is maintained.
Exercise 2.
Exercise 2.1
We can observe that the function is separable. We can rewrite the function by grouping the and terms independently:
Let denote the -dependent term and denote the -dependent term. To minimize the total function , we can find the coordinate that minimizes and the coordinate that minimizes completely separately. This reduces the 2D problem to two independent 1D problems, that are basically the weighted median problem.
We now describe the algorithm to find the optimal ; the algorithm for is identical. First, we calculate the total weight . We then create a list of pairs and sort this list in non-decreasing order based on the coordinates.
Next, we iterate through this sorted list, maintaining a cumulative weight sum , initialized to zero. For each point in the sorted list, we add its weight . The first coordinate for which this cumulative sum reaches or exceeds half the total weight (i.e., ) is the weighted median. We set and terminate the 1D search. The final optimal point is .
Correctness and time bound:
The correctness of separating the problem follows from the additive nature of the Manhattan distance.
The correctness of the algorithm relies on being a convex function. Let's study the derivative to find the minimum.
We begin by expressing the derivative of the sum as the sum of the derivatives, noting that :
For any point not equal to any , the function is differentiable. We divide the sum into two groups:
Group 1 (): then .
Group 2 (): then .
Substituting the signs:
The function is minimized at the point where the difference is zero, meaning the total weight "pulling" to the left equals the total weight "pulling" to the right. This transition occurs precisely at the weighted median because, by definition, is the first point where the cumulative weight reaches .
For the time bound, the algorithm is dominated by the sorting step. Let be the number of points. Calculating takes time. Sorting the pairs takes time. The final pass to find the median takes time. Since this procedure is run twice, the total time complexity is .
Actually, this bound can be improved to using a Weighted Quickselect algorithm. To guarantee this linear performance and avoid the worst case, a non-unbalanced pivot is chosen in linear time using the Median of Medians algorithm as a pivot selection strategy. This reduces the total time complexity for the 2D problem to , which is asymptotically optimal, although the two algorithms are not so easy to implement.
Exercise 2.2
Let:
be any point in the plane
be the -th site
the true Euclidean cost function
be the Manhattan cost function
be the point that truly minimizes the Euclidean cost
be the point found by our algorithm, which minimizes the Manhattan cost .
Our goal is to prove that the Euclidean cost of our algorithm's solution is at most times the Euclidean cost of the true optimal solution, that is, .
We must use two fundamental geometric inequalities that relate the Euclidean distance () and Manhattan distance () between any two points:
, or
, or
By the definition of , we have . Applying inequality 1 () to every term in the sum, this is less than or equal to , which is precisely the definition of . This gives us .
Next, we can use the key property of our algorithm, that is is the point that minimizes . Therefore, must be less than or equal to the Manhattan cost of any other point, including the true Euclidean optimal point . This is .
Now, we apply inequality 2 () to every term in the sum . This yields . By factoring out the constant, we get , which is by definition equal to .