Sunday, 23 June 2013

Local Improvement

Examining the geometric interpretation of integer programming reveals that a problem’s constraints form a polytope. Optimum solutions for the relaxation of the problem to linear programming can be found on the convex hull of this polytope. But unfortunately, optimum integer solutions are often found inside the polytope rather than on the convex hull. This is why linear programming does not solve integer programming problems.
Consider the two dimensional optimization problem shown in figure 1. The feasible solution space is the darker area and we wish to maximize the sum of the two variables. Integer solutions to the problem occur at the intersections of the grid edges.

Figure 1 - Local Search Path Through Neighborhoods
Suppose that we were able somehow to find a feasible solution, perhaps through the use of a greedy method. Suppose that the solution indicated by the dot labeled a in figure 1 is such a solution. Let us search the area directly around this solution (which we call the neighborhood of the solution) to see if there is a better solution which is not too different than the one we have. If we find a better solution, then we continue the search for better solutions. This process is illustrated in figure 1. The dots are feasible solutions and the large circles are neighborhoods. As we find better and better solutions, we progress upwards and to the right on the path:
a ® b ® c ® d ® e,
until an optimal solution (in this case found at point e) is encountered.
This method is entitled local search and calls for searching a small area around a solution and adopting a better solution if found. The process halts when no better solutions occur. This algorithm is illustrated in figure 2.
Figure 2 - A Basic Local Search Algorithm
We shall begin with Closed City Tours as out first example. If t is a tour of n cities, then a 1-change neighborhood is defined as:
N1c(t) = {s | s = t with one city’s position changed}
After forming the neighborhood, it is searched for a better tour. In figure 3, a city is moved in going from (a) to (b), resulting in a better tour.
Figure 3 - A Local Change in a Closed Tour
An even better neighborhood definition for closed tours is a class called k-optimal neighborhoods. These are the result of removing k edges from a tour and reconnecting the tour. Local search methods using 3-optimal neighborhoods have proven very effective.
One of the most famous local search algorithms is Kernighan and Lin's min-cut algorithm for the graph partition problem. The problem’s formal definition appears below.
Graph Partition. Given a weighted graph G = (V, W). Find disjoint sets of vertices A and B such that AÈ B = V and the sum of the weights of the edges between A and B is minimized.
Let us take a weighted graph G = (V, W) where W[i, j] provides the weight of the edge between vi and vj. In figure 4 we find an example with the vertex set V = {r, s, t, u, v, w}.
Figure 4 - Weighted Graph
First we partition V into two subsets of the same size which we shall name A and B. Then we call the sum of the weights of all of the edges between A and B the cost of the partition. This is denoted Cost(A, B). In our example, we partition the graph into A = {r, s, t} and B = {u, v, w}. After adding up the weights of all the edges passing between vertices in A and B we find that Cost(A, B) = 20.
As noted above, the Min-Cut algorithm strategy begins with an arbitrary partition of the vertex set V into sets A and B. Then we attempt to form better partitions by swapping vertices between A and B until the cost seems to be the best that can be achieved.
To do this we must examine neighborhoods formed by exchanging pairs of vertices from A and B. If the partition P is the pair <A, B>, then for two vertices a Î A and b Î B the partition Pab is:
Pab = <A È {b} - {a}, B È {a} - {b}>.
It is formed by swapping the vertices a and b between partitions. This makes the neighborhood of P = <A, B> the collection of all such partitions. That is:
N(P) = { Pab | for all a Î A and b Î B}.
Now we need to formulate the change in cost of swapping induces.
Definition. The external cost or gain E(a) of moving vertex a out of the set A is the sum of the weights of all the edges leaving A from a.
Definition. The internal cost or loss I(a) of moving vertex a out of the set A is the sum of the weights of the edges between a and other vertices in A.
Definition. The total cost or difference incurred by moving vertex a out of the set A is: D(a) = E(a) - I(a).
An easy technical lemma follows directly from these definitions.
Lemma. The gain incurred from swapping vertices a Î A and b Î B is: g(a, b) = D(a) + D(b) - 2*W[a, b].
Let’s look at the values of these items using our last graph and the partition of A = {r, s, t} and B = {u, v, w}.
Now that we know the external and internal costs incurred by swapping vertices across the partition, here are the costs when pairs are swapped.
From these numbers we can conclude that swapping r and u seems to be a good idea. So is swapping t and w. We should probably not wish to swap s and v however. Or at least not at this time.
Let us apply the local search algorithm. We take the graph G = (V, W) and divide the vertex set V arbitrarily into sets A = {r, s, t} and B = {u, v, w}. Then we examine the nine neighborhoods that result from exchanging pairs of vertices from A and B. Figure 5a contains difference values for all of the vertices in the table on the left while the table on the right indicates all of the gains involved in swapping vertices between A and B.
a) Swap r and u, total gain = 6
b) Best swap: s and v, total gain = 6
Figure 5 - Tables used during Min-Cut Local Search
Swapping either r and u or t and w both provide a gain of 6. We elect to exchange r and u. Then we recalculate the differences and gains for all vertices. These new values appear in figure 5b. At this point it seems that the only swap which is not destructive involves s and v. The local search terminates at this point because there is no gain.
It is intriguing to wonder if swapping s and v leads to a neighborhood where better partitions reside. As we see in figure 6, this is not to be.
Best new swap: t and r, total gain = -1
Figure 6 - More Min-Cut Local Search Tables
The idea is worth following and Kernighan and Lin did so. They argued that partitions such as that above are often local minima and continuing to swap vertices, even though there is negative gain might lead to neighborhoods with better partitions. Consider the graph in figure 7 that illustrates how total gain might possibly change over time.
Figure 7 - Total Gain over Time
After swaps one, three, and eight, local maxima occur in the total gain function. The local search algorithm would halt at any of these. What we would like to do is continue to swap and possibly reach the global maximum at step eight. Kernighan and Lin’s variable depth method of local search provides this capability.
We shall now examine sequences of swaps. The process begins as before by searching the neighborhood of the partition for the best pair of vertices to exchange. These are swapped even if this reduces the total gain. In our example the sequence looks like this:
To prevent cycling, we only allow swapping a vertex once in a sequence of exchanges. Thus, after r and u are swapped in the above sequence, they become unavailable for exchanging. Vertex swapping is continued until A and B have been totally interchanged. We then retain all swaps up to the point of maximum gain in the sequence. In our example, this means retaining only the first swap. At this point we begin again with another sequence of swaps and continue swapping sequences until no gain takes place. The second swapping sequence is:
Since no gain took place, we halt and present {u, s, t} and {r, v, w} as our partition. Figure 8 provides a description of the entire algorithm.
Figure 8 - Graph Partitioning Algorithm
Several implementation features are of value since they make the algorithm run a bit faster. For example, the recalculation of D[u] for all u Î V that are still available just requires a small update, not an entire one. If ai and bk have been swapped and u Î A then:
D[u] := D[u] + 2* (W[u, ai] - W[u, bk])
Finding the pair to swap is much quicker also if the values of D[a] and D[b] have been sorted. Also, we need not do n/2 swaps since the last one merely completes the entire interchange of A and B and changes the total gain to zero.
Now let us turn our attention to the analysis of this algorithm. Three things must be examined: correctness, complexity, and possibility of reaching an optimum solution. We shall take the latter first.
Theorem 1. Using the basic operations of the MinCut algorithm it is possible to reach an optimal solution.
Proof. The MinCut algorithm swaps vertices. Thus we need to show that by swapping vertices it is possible to go from any partition to an optimal one, that is, one with the minimum cut.
Suppose we are given an arbitrary partition P and an optimal one POPT. A brief examination of P reveals that it either is the same as POPT or it is not. If not, then there is a vertex in each half of P that does not belong. Swap them and continue until the desired partition appears.
For this problem, correctness is rather simple. As there are no guarantees, we only need show that a feasible solution has been found. Thus we may state the following with very little proof.
Theorem 2. The MinCut algorithm is correct.
Proof. We begin with a feasible solution. Swapping two vertices provides another feasible solution. Thus at every stage of the algorithm we have a feasible solution.
With approximation algorithms, correctness is not as much of an issue as with optimum algorithms. However, the complexity of the procedure is of great importance to us and must be examined.
Computing the differences D[u] initially is an O(n2) operation since all of the graph edges need to be examined. Setting the vertices as available requires only linear time.
Inside the for loop, selection of the pair to be swapped can be O(n2). Recalculating each D[u] takes a constant amount of time for each available u, so this too is O(n). Empirical results indicate that the repeat loop will be repeated less than four times, even for very large values of n. These results indicate a time complexity of O(n3) for the algorithm.