Greedy Algorithm Applications - Knapsack, Scheduling, Clustering

COMP202 Lectures 2022-03-08

The greedy method solves optimisation problems by going through a sequence of feasible choices.

The sequence starts from a well-understood configuration and then iteratively makes the decision that seems best from all those that are currently possible.

Problems that have a greedy solution possess the greedy-choice property.

Fractional Knapsack Problem

This is a variation of the knapsack problem where we are able to take arbitrary fractions of each item. The problem is formally defined like so:

Let $S$ be a set of $n$ items, where each item $i$ has a positive benefit $b_i$ and a positive weight $w_i$. Additionally, we are allowed to take arbitrary fractions $x_i$ of each item such that $0\leq x_i\leq w_i$.

Goal - Find the maximum benefit subset that does not exceeds the total weight $W$.

The total benefit of the items taken is determined by the sum:

\[\sum_{i\in S}b_i\left(\frac{x_i}{w_i}\right)\]

Fractional Knapsack Method

The general method takes the following form:

Compute the value index for each item $i$ defined by:
\[v_i=\frac{b_i}{w_i}\]
Select the items to include in the knapsack, starting with the highest value index.

We can implement this in the following pseudo-code:

fractionalKnapsack(S, W)

for i = 1 to |S| do
	xi = 0
	vi = bi / wi
	insert(vi, i) into heap H	// max value at root

w = o
while w < W do
	Remove the max value from H
	a = min(wi, W - w)
	xi = a
	w = w + a

Input - Set of $S$ items, item $i$ has weight $w_i$ and benefit $b_i$; maximum total weight $W$.
Output - Amount $x_i$ of each item to maximise the total benefit.

Time Complexity of Fractional Knapsack Problem

Using a heap, with the maximum at the root, we can compute the value indices and build the heap in $O(n\log n)$ time.
Then, each greedy choice, which removes an item with the greatest remain value index, requires $O(\log n)$ time.

The {0, 1} - Knapsack Problem

This is the same knapsack problem that was discussed in COMP208 - Greedy Algorithm - 1 - Knapsack Problem where you cannot take fractions of items.

As discussed in that lecture, the greedy method will not find an optimal solution to this problem. To solve this we can use a dynamic programming approach instead.

Interval Scheduling

Consider that we have a set of tasks to execute on a machine. We want to select a set of task to maximise the number of tasks that are executed on the machine.

Given a set $T$ of $n$ tasks, where each task has a start time $s_i$ and end time $f_i$. Two tasks $i$ and $j$ are non-conflicting if:

\[f_i\leq s_j \text{ or } f_j\leq s_i\]

A suitable method to find such a schedule would be to select non-conflicting tasks in order of their finish times.

This allows us to free up the machine as soon as possible for further activities.

We can do this with the following pseudo-code:

intervalSchedule(T)

A = []
while T != []
	do
		Remove the task t with the earliest completion time.
		Add t to A
		Delete all tasks from T that conflict with t
Return the set A as the set of scheduled tasks

Input - A set $T$ of tasks with start times and end times.
Output - A maximum-size subset $A$ of non-conflicting tasks.

This algorithm can run in:

\[O(n\log n)\]

where the main amount of time is sorting the tasks by their completion times.

Multiple Machine Task Scheduling

Consider that instead of one machine, we have a set of machines. We want to execute the tasks using as few machines as possible. We can use the following greedy algorithm to compute the schedules:

Consider the set of tasks ordered by their start times. For each task $i$, if we have a machine that is non-conflicting, then we schedule the task to that machine. Otherwise, we allocate a new machine.

We can do this with the following pseudo-code:

taskSchedule(T)

m = 0
while T != []
	do
		removeMin(T)
		if every machine j with no conflicts
			then schedule(i,j)
		else
			m = m + 1
			schedule(i, m)

Input - A set $T$ of tasks with start times and end times.
Output - A non-conflicting schedule of the $T$ tasks.

This algorithm also takes the following amount of time:

\[O(n\log n)\]

There is a proof of this algorithm starting from slide 28 of the lecture.

Clustering

This is the process of classifying objects based on their distance.

Consider that we want to categorise $n$ objects into $k$ (non-empty) clusters. We want each cluster $C_1, \ldots, C_k$ to be maximally spaced apart.

In order to do this, we can consider our objects as points that need to be joined together into several graphs. We can connect objects together, in order of their distance, until we have the same number of graphs as $k$.

We should connect objects in such a way that there are no cycles, similar to Kruskal’s algorithm.

This algorithm takes:

\[O(n^2\log n)\]

where the main bottleneck is to perform the sort of the pairwise distances.

There is a proof of this procedure starting from slide 40 of the lecture.