Dynamic Programming

Overview

Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n2) or O(n3) for which a naive approach would take exponential time. In this lecture, we discuss this technique, and present a few key examples. Topics in this lecture include:

• The basic idea of Dynamic Programming.
• Example: Longest Common Subsequence.
• Example: Knapsack.
• Example: Matrix-chain multiplication.

The Basic Idea

Dynamic Programming is a powerful technique that can be used to solve many problems in time O(n2) or O(n3) for which a naive approach would take exponential time. (Usually to get running time below that—if it is possible—one would need to add other ideas as well.) Dynamic Programming is a general approach to solving problems, much like “divide-and-conquer” is a general method, except that unlike divide-and-conquer, the subproblems will typically overlap. This lecture we will present two ways of thinking about Dynamic Programming as well as a few examples.

There are several ways of thinking about the basic idea.

Basic Idea (version 1): What we want to do is take our problem and somehow break it down into a reasonable number of subproblems (where “reasonable” might be something like n2) in such a way that we can use optimal solutions to the smaller subproblems to give us optimal solutions to the larger ones. Unlike divide-and-conquer (as in mergesort or quicksort) it is OK if our subproblems overlap, so long as there are not too many of them.

Example #1: Longest Common Subsequence

Definition 1 The Longest Common Subsequence (LCS) problem is as follows. We are given two strings: string S of length n, and string T of length m. Our goal is to produce their longest common subsequence: the longest sequence of characters that appear left-to-right (but not necessarily in a contiguous block) in both strings.

For example, consider:

S = ABAZDC

T = BACBAD

In this case, the LCS has length 4 and is the string ABAD. Another way to look at it is we are finding a 1-1 matching between some of the letters in S and some of the letters in T such that none of the edges in the matching cross each other.

For instance, this type of problem comes up all the time in genomics: given two DNA fragments, the LCS gives information about what they have in common and the best way to line them up.

Let’s now solve the LCS problem using Dynamic Programming. As subproblems we will look at the LCS of a prefix of S and a prefix of T, running over all pairs of prefixes. For simplicity, let’s worry first about finding the length of the LCS and then we can modify the algorithm to produce the actual sequence itself.

So, here is the question: say LCS[i,j] is the length of the LCS of S[1..i] with T[1..j]. How can we solve for LCS[i,j] in terms of the LCS’s of the smaller problems?

Case 1: what if S[i] <> T[j]? Then, the desired subsequence has to ignore one of S[i] or T[j] so we have:

LCS[i,j] = max(LCS[i − 1,j], LCS[i,j − 1]).

Case 2: what if S[i] = T[j]? Then the LCS of S[1..i] and T[1..j] might as well match them up. For instance, if I gave you a common subsequence that matched S[i] to an earlier location in T, for instance, you could always match it to T[j] instead. So, in this case we have:

LCS[i,j] = 1 + LCS[i−1,j−1].

So, we can just do two loops (over values of i and j) , filling in the LCS using these rules. Here’s what it looks like pictorially for the example above, with S along the leftmost column and T along the top row.












BACBAD
A011111
B111222
A122233
Z122233
D122234
C123334

We just fill out this matrix row by row, doing constant amount of work per entry, so this takes O(mn) time overall. The final answer (the length of the LCS of S and T) is in the lower-right corner.

How can we now find the sequence? To find the sequence, we just walk backwards through matrix starting the lower-right corner. If either the cell directly above or directly to the right contains a value equal to the value in the current cell, then move to that cell (if both to, then chose either one). If both such cells have values strictly less than the value in the current cell, then move diagonally up-left (this corresponts to applying Case 2), and output the associated character. This will output the characters in the LCS in reverse order. For instance, running on the matrix above, this outputs DABA.

More on The Basic Idea, and Example #1 Revisited

We have been looking at what is called “bottom-up Dynamic Programming”. Here is another way of thinking about Dynamic Programming, that also leads to basically the same algorithm, but viewed from the other direction. Sometimes this is called “top-down Dynamic Programming”.

Basic Idea (version 2): Suppose you have a recursive algorithm for some problem that gives you a really bad recurrence like T(n) = 2T(n − 1) + n. However, suppose that many of the subproblems you reach as you go down the recursion tree are the same. Then you can hope to get a big savings if you store your computations so that you only compute each different subproblem once. You can store these solutions in an array or hash table. This view of Dynamic Programming is often called memoizing.

For example, for the LCS problem, using our analysis we had at the beginning we might have produced the following exponential-time recursive program (arrays start at 1):

LCS(S,n,T,m)    
{    
    if (n==0 || m==0)   
        return 0;    
    if (S[n] == T[m]) 
        result = 1 + LCS(S,n-1,T,m-1); // no harm in matching up
    else 
        result = max{LCS(S,n-1,T,m), LCS(S,n,T,m-1)};  
    return result;
}  

This algorithm runs in exponential time. In fact, if S and T use completely disjoint sets of characters (so that we never have S[n] == T[m]) then the number of times that LCS(S,1,T,1) is recursively called equals [n+m−2,m-1]. In the memoized version, we store results in a matrix so that any given set of arguments to LCS only produces new work (new recursive calls) once. The memoized version begins by initializing arr[i][j] to unknown for all i, j, and then proceeds as follows:

LCS(S,n,T,m)
{
    if (n==0 || m==0) 
        return 0;
    if (arr[n][m] != unknown) return arr[n][m]; // <- added this line (*) if (S[n] == T[m]) 
        result = 1 + LCS(S,n-1,T,m-1);
    else 
        result = max{LCS(S,n-1,T,m), LCS(S,n,T,m-1)};
    arr[n][m] = result; // <- and this line (**)  
    return result;
}  

All we have done is saved our work in line (**) and made sure that we only embark on new recursive calls if we haven’t already computed the answer in line (*).

In this memoized version, our running time is now just O(mn). One easy way to see this is as follows. First, notice that we reach line (**) at most mn times (at most once for any given value of the parameters). This means we make at most 2mn recursive calls total (at most two calls for each time we reach that line). Any given call of LCS involves only O(1) work (performing some equality checks and taking a max or adding 1), so overall the total running time is O(mn).

Comparing bottom-up and top-down dynamic programming, both do almost the same work. The top-down (memoized) version pays a penalty in recursion overhead, but can potentially be faster than the bottom-up version in situations where some of the subproblems never get examined at all. These differences, however, are minor: you should use whichever version is easiest and most intuitive for you for the given problem at hand.

More about LCS: Discussion and Extensions. An equivalent problem to LCS is the “mini- mum edit distance” problem, where the legal operations are insert and delete. (E.g., the unix “diff” command, where S and T are files, and the elements of S and T are lines of text). The minimum edit distance to transform S into T is achieved by doing |S|−LCS(S,T) deletes and |T|−LCS(S,T) inserts.

In computational biology applications, often one has a more general notion of sequence alignment. Many of these different problems all allow for basically the same kind of Dynamic Programming solution.

Example #2: The Knapsack Problem

Imagine you have a homework assignment with different parts labeled A through G. Each part has a “value” (in points) and a “size” (time in hours to complete). For example, say the values and times for our assignment are:








A
B
C
D
E
F
G
value
7
9
5
12
14
6
12
time
3
4
2
6
7
3
5

Say you have a total of 15 hours: which parts should you do? If there was partial credit that was proportional to the amount of work done (e.g., one hour spent on problem C earns you 2.5 points) then the best approach is to work on problems in order of points/hour (a greedy strategy). But, what if there is no partial credit? In that case, which parts should you do, and what is the best total value possible?

The above is an instance of the knapsack problem, formally defined as follows:

Definition 2 In the knapsack problem we are given a set of n items, where each item i is specified by a size si and a value vi. We are also given a size bound S (the size of our knapsack). The goal is to find the subset of items of maximum total value such that sum of their sizes is at most S (they all fit into the knapsack).

We can solve the knapsack problem in exponential time by trying all possible subsets. With Dynamic Programming, we can reduce this to time O(nS).

Let’s do this top down by starting with a simple recursive solution and then trying to memoize it. Let’s start by just computing the best possible total value, and we afterwards can see how to actually extract the items needed.

// Recursive algorithm: either we use the last element or we don’t.
   Value(n,S) // S = space left, n = # items still to choose from 
{
    if (n == 0) 
        return 0; 
    if (s_n > S) 
        result = Value(n-1,S); // can’t use nth item
    else 
        result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)};
    return result;
}

Right now, this takes exponential time. But, notice that there are only O(nS) different pairs of values the arguments can possibly take on, so this is perfect for memoizing. As with the LCS problem, let us initialize a 2-d array arr[i][j] to “unknown” for all i, j.

Value(n,S) 
{
    if (n == 0) 
        return 0;
    if (arr[n][S] != unknown) 
        return arr[n][S]; // <- added this 
    if (s_n > S) 
        result = Value(n-1,S);
    else 
        result = max{v_n + Value(n-1, S-s_n), Value(n-1, S)}; 
    arr[n][S] = result; // <- and this 
    return result;
}

Since any given pair of arguments to Value can pass through the array check only once, and in doing so produces at most two recursive calls, we have at most 2n(S + 1) recursive calls total, and the total time is O(nS).

So far we have only discussed computing the value of the optimal solution. How can we get the items? As usual for Dynamic Programming, we can do this by just working backwards: if arr[n][S] = arr[n-1][S] then we didn’t use the nth item so we just recursively work backwards from arr[n-1][S]. Otherwise, we did use that item, so we just output the nth item and recursively work backwards from arr[n-1][S-s_n]. One can also do bottom-up Dynamic Programming.

Example #3: Matrix product parenthesization

Our final example for Dynamic Programming is the matrix product parenthesization problem.

Say we want to multiply three matrices X, Y, and Z. We could do it like (XY)Z or like X(YZ). Which way we do the multiplication doesn’t affect the final outcome but it can affect the running time to compute it. For example, say X is 100x20, Y is 20x100, and Z is 100x20. So, the end result will be a 100x20 matrix. If we multiply using the usual algorithm, then to multiply an lxm matrix by an mxn matrix takes time O(lmn). So in this case, which is better, doing (XY)Z or X(YZ)?

Answer: X(YZ) is better because computing YZ takes 20x100x20 steps, producing a 20x20 matrix, and then multiplying this by X takes another 20x100x20 steps, for a total of 2x20x100x20. But, doing it the other way takes 100x20x100 steps to compute XY , and then multplying this with Z takes another 100x20x100 steps, so overall this way takes 5 times longer. More generally, what if we want to multiply a series of n matrices?

Definition 3 The Matrix Product Parenthesization problem is as follows. Suppose we need to multiply a series of matrices: A1 × A2 × A3 × . . . × An. Given the dimensions of these matrices, what is the best way to parenthesize them, assuming for simplicity that standard matrix multiplication is to be used (e.g., not Strassen)?

There are an exponential number of different possible parenthesizations, in fact [2(n−1),n-1]/n, so we don’t want to search through all of them. Dynamic Programming gives us a better way.

As before, let’s first think: how might we do this recursively? One way is that for each possible split for the final multiplication, recursively solve for the optimal parenthesization of the left and right sides, and calculate the total cost (the sum of the costs returned by the two recursive calls plus the lmn cost of the final multiplication, where “m” depends on the location of that split). Then take the overall best top-level split.

For Dynamic Programming, the key question is now: in the above procedure, as you go through the recursion, what do the subproblems look like and how many are there? Answer: each subproblem looks like “what is the best way to multiply some sub-interval of the matrices Ai × . . . × Aj?” So, there are only O(n2) different subproblems.

The second question is now: how long does it take to solve a given subproblem assuming you’ve already solved all the smaller subproblems (i.e., how much time is spent inside any given recursive call)? Answer: to figure out how to best multiply Ai × . . . × Aj, we just consider all possible middle points k and select the one that minimizes:

optimal cost to multiply Ai . . . Ak ← already computed

+ optimal cost to multiply Ak+1 . . . Aj ← already computed

+ cost to multiply the results. ← get this from the dimensions

This just takes O(1) work for any given k, and there are at most n different values k to consider, so overall we just spend O(n) time per subproblem. So, if we use Dynamic Programming to save our results in a lookup table, then since there are only O(n2) subproblems we will spend only O(n3) time overall.

If you want to do this using bottom-up Dynamic Programming, you would first solve for all sub- problems with ji = 1, then solve for all with ji = 2, and so on, storing your results in an n by n matrix. The main difference between this problem and the two previous ones we have seen is that any given subproblem takes time O(n) to solve rather than O(1), which is why we get O(n3) total running time. It turns out that by being very clever you can actually reduce this to _O-(1) amortized time per subproblem, producing an O(n2)-time algorithm, but we won’t get into that here.

High-level Discussion of Dynamic Programming

What kinds of problems can be solved using Dynamic Programming? One property these problems have is that if the optimal solution involves solving a subproblem, then it uses the optimal solution to that subproblem. For instance, say we want to find the shortest path from A to B in a graph, and say this shortest path goes through C. Then it must be using the shortest path from C to B. Or, in the knapsack example, if the optimal solution does not use item n, then it is the optimal solution for the problem in which item n does not exist. The other key property is that there should be only a polynomial number of different subproblems. These two properties together allow us to build the optimal solution to the final problem from optimal solutions to subproblems.

In the top-down view of dynamic programming, the first property above corresponds to being able to write down a recursive procedure for the problem we want to solve. The second property corresponds to making sure that this recursive procedure makes only a polynomial number of different recursive calls. In particular, one can often notice this second property by examining the arguments to the recursive procedure: e.g., if there are only two integer arguments that range between 1 and n, then there can be at most n2 different recursive calls.

Sometimes you need to do a little work on the problem to get the optimal-subproblem-solution property. For instance, suppose we are trying to find paths between locations in a city, and some intersections have no-left-turn rules (this is particulatly bad in San Francisco). Then, just because the fastest way from A to B goes through intersection C, it doesn’t necessarily use the fastest way to C because you might need to be coming into C in the correct direction. In fact, the right way to model that problem as a graph is not to have one node per intersection, but rather to have one node per ⟨intersection, direction⟩ pair. That way you recover the property you need.

The Correctness of Greedy Algorithm

Overview

Greedy algorithms can be some of the simplest algorithms to implement, but they’re often among the hardest algorithms to design and analyze. You can often stumble on the right algorithm but not recognize that you’ve found it, or might find an algorithm you’re sure is correct and hit a brick wall trying to formally prove its correctness.

This handout discusses how to structure the two major proof techniques we’ve covered for greedy algorithms (“greedy stays ahead” and exchange arguments) and gives some intuition for when one might be appropriate over the other. We recommend referring to the lecture slides for examples of formal, worked examples of these proofs in practice.

Format for Correctness Proofs

Greedy algorithms are often used to solve optimization problems: you want to maximize or minimize some quantity subject to a set of constraints. For example:

• Maximize the number of events you can attend, but do not attend any overlapping events.
• Minimize the number of jumps required to cross the pond, but do not fall into the water.
• Minimize the cost of all edges chosen, but do not disconnect the graph.

When you are trying to write a proof that shows that a greedy algorithm is correct, you often need to show two different results. First, you need to show that your algorithm produces a feasible solution, a solution to the problem that obeys the constraints. For example, when discussing the frog jumping problem, we needed to prove that the series of jumps the greedy algorithm found actually gave a legal path across the pond. Next, you need to show that your algorithm produces an optimal solution, a solution that maximizes or minimizes the appropriate quantity.

It’s usually much easier to prove feasibility than to prove optimality, and in lecture we’ve routinely hand-waved our way through this. When writing up a formal proof of correctness, though, you shouldn’t skip this step. Typically, these proofs work by induction, showing that at each step, the greedy choice does not violate the constraints and that the algorithm terminates with a correct solution.

As an example, here is a formal proof of feasibility for Prim’s algorithm. In lecture, we saw that the general idea behind the proof was to show that the set T of edges added to the MST so far form a spanning tree of the set S. The proof uses the fact that a tree is a connected graph where |E| = |V| – 1 to show that after we add an edge crossing the cut (S, VS), the set T must be a tree because it has the right number of edges and is still connected.

Here is how we might formalize this:

Theorem (Feasibility): Prim’s algorithm returns a spanning tree.
Proof: We prove by induction that after k edges are added to T, that T forms a spanning tree of S. As a base case, after 0 edges are added, T is empty and S is the single node {v}. Also, the set S is connected by the edges in T because v is connected to itself by any set of edges. Therefore, T connects S and satisfies |T| = |S| - 1, so T is a spanning tree of S.

For the inductive step, assume the claim is true after k edges are added. If at this point S = V, the algorithm terminates, and since T is a spanning tree of S, T is a spanning tree of V, as required. Otherwise, SV, so the algorithm proceeds for another iteration. Prim’s algorithm selects an edge (u, v) crossing the cut (S, VS) and then sets S to S ∪ {v} and T to T ∪ {(u, v)} Since at the start of the iteration T was a spanning tree for S, it connected all nodes in S. Therefore, all nodes in S are still connected to one another, and v is now connected to all nodes in S ∪ {v}, since it is connected to u and u is connected to all nodes in S. Finally, note |T ∪ {(u,v)}| = |S| – 1 + 1 = |S| = |S ∪ {v}| – 1,so T ∪ {(u,v)} is a spanning tree for S ∪ {v}, so the claim holds after k + 1 edges have been added.

“Greedy Stays Ahead” Arguments

One of the simplest methods for showing that a greedy algorithm is correct is to use a “greedy stays ahead” argument. This style of proof works by showing that, according to some measure, the greedy algorithm always is at least as far ahead as the optimal solution during each iteration of the algorithm. Once you have established this, you can then use this fact to show that the greedy algorithm must be optimal.

Typically, you would structure a “greedy stays ahead” argument in four steps:

• Define Your Solution. Your algorithm will produce some object X and you will probably compare it against some optimal solution X*. Introduce some variables denoting your algorithm’s solution and the optimal solution.

• Define Your Measure. Your goal is to find a series of measurements you can make of your solution and the optimal solution. Define some series of measures m1(X), m2(X), …, mn(X) such that m1(X*), m2(X*), …, mk(X*) is also defined for some choices of m and n. Note that there might be a different number of measures for X and X*, since you can’t assume at this point that X is optimal.

• Prove Greedy Stays Ahead.Prove that mi(X) ≥ mi(X*) or that mi(X) ≤ mi(X*), whichever is appropriate, for all reasonable values of i. This argument is usually done inductively.

• Prove Optimality. Using the fact that greedy stays ahead, prove that the greedy algorithm must produce an optimal solution. This argument is often done by contradiction by assuming the greedy solution isn’t optimal and using the fact that greedy stays ahead to derive a contradiction.

When writing up a proof of this form, you don’t need to explicitly enumerate these steps (we didn’t do this in lecture, if you’ll recall). However, these steps likely need to be here. If you don’t define your solution and the optimal solution, the notation won’t make sense later on. If you don’t specify your measurements, you can’t prove anything about them. If you forget to prove that greedy stays ahead, the rest of your proof can’t assume it. Finally, if you don’t prove how the fact that greedy stays ahead implies optimality, you haven’t actually proven what you need to prove (namely, that you get an optimal solution.)

The main challenge with this style of argument is finding the right measurements to make. If you pick the wrong measurements, you will either get stuck proving that greedy stays ahead (often because it doesn’t always stay ahead!) or proving that because greedy stays ahead, the algorithm must be optimal (often because you’re using the wrong measurement). As mentioned in lecture, it’s often useful to try showing that if greedy stays ahead according to your measurements, then the algorithm is optimal before you try actually showing that greedy stays ahead. That way, you won’t end up trying to prove that greedy stays ahead in a measurement with no bearing on the result.

A few other common pitfalls to watch out for:

• When defining your measurements, make sure that you define them in a way that lets you measure the optimal solution you’re comparing against, especially if that solution wasn’t generated by the greedy algorithm. It’s typically easy to find measurements of the greedy solution because it’s generated one step at a time; the challenge is figuring out how to determine what specifically those measurements are made on. For example, in the interval scheduling problem, the measurements made corresponded to the end times of the events as they were added to the greedy solution. To make those measurements applicable to the arbitrarily-chosen optimal schedule S*, we had to define those measurements on an absolute scale: our measurements measured the finishing times of the ith event to finish, sorted by order of finishing time.

• Be wary of the case where the greedy solution and optimal solutions don’t necessarily have the same sizes as one another. In some problems, such as the MST problem, all legal spanning trees have the same sizes as one another, though they have different costs. In others, such as a the frog jumping problem, different solutions might have different numbers of jumps in them. If you do use a “greedy stays ahead” argument, you should be sure that you don’t try showing that your greedy algorithm is better by some measure mn if the optimal solution can only be measured by k < n measurements.

• Although the name of the argument is “greedy stays ahead,” you are usually more properly showing that “greedy never falls behind.” That is, you want to show that the greedy solution is at least as good as the optimal solution, not strictly better than the optimal solution.

It takes some practice to know when a “greedy stays ahead” proof will be appropriate. Often, these arguments are useful when optimal solutions might have different sizes, since you can use the fact that greedy stays ahead to explain why your solution must be no bigger/no smaller than the optimal solution: if the greedy algorithm did/didn’t terminate at some step, then the optimal solution at that point must be too big/too small and therefore is incorrect. However, there’s no hard-and-fast rule about where to apply this proof technique, and you’ll build experience working with it as you work through the problem set on greedy algorithms.

Exchange Arguments

Exchange arguments are a powerful and versatile technique for proving optimality of greedy algorithms. They work by showing that you can iteratively transform any optimal solution into the solution produced by the greedy algorithm without changing the cost of the optimal solution, thereby proving that the greedy solution is optimal.

Typically, exchange arguments are set up as follows:

• Define Your Solutions. You will be comparing your greedy solution X to an optimal solution X*, so it’s best to define these variables explicitly.

• Compare Solutions. Next, show that if XX*, then they must differ in some way. This could mean that there’s a piece of X that’s not in X*, or that two elements of X that are in a different order in X*, etc. You might want to give those pieces names.

• Exchange Pieces. Show how to transform X* by exchanging some piece of X* for some piece of X. You’ll typically use the piece you described in the previous step. Then, prove that by doing so, you did not increase/decrease the cost of X* and therefore have a different optimal solution. (You might also be able to immediately conclude that you’ve strictly worsened X*, in which case you’re done. This is uncommon and usually only works if there’s just one optimal solution.)

• Iterate. Argue that you have decreased the number of differences between X and X* by performing the exchange, and that by iterating this process you can turn X* into X without impacting the quality of the solution. Therefore, X must be optimal. (To be very rigorous here, you would probably proceed by induction to show this, but for the purposes of this class it’s fine to hand-wave and explain why you could iteratively transform the solution.)

In some cases, exchange arguments can be significantly easier than “greedy stays ahead” arguments, since often the reason you were greedily optimizing over some quantity is because any solution that doesn’t greedily optimize that quantity is either suboptimal or could be locally modified to optimize it without changing its cost.

When writing up exchange arguments, watch out for the following:

• Your solution X could be optimal even if XX*, since there can be many optimal solutions to the same problem. Approaching these proofs by contradiction can get you stuck because you won’t actually get a contradiction by making a local modification to the optimal solution. Typically, you will only get a contradiction if there is only one optimal solution.

• Your solution may need to iteratively transform X* into X. Making one modification makes X* and X closer to one another, but it doesn’t guarantee that X and X* are now equal. This is why we suggest ending your proof with an iteration argument explaining why the procedure can be iterated.

• Be sure your proof accounts for why the piece of X you’re exchanging for a piece of X* actually must exist in the first place. This shouldn’t be too hard, since it typically follows from the fact that XX*, but you should offer some justification.

As with “greedy stays ahead,” there is no hard-and-fast rule explaining when this proof technique will work. Exchange arguments work particularly well when all optimal solutions have the same size and differ only in their cost, since that way you can justify how you are going to remove some part of the optimal solution and snap another one in its place. Determining when to use exchange arguments is a skill you’ll pick up as you practice on the problem set.