Max/min sum subarray

Problem: Given an array of integers, find the subarray with the maximal/minimal sum. Solution: Kadane’s algorithm. The following solution is for maximal sum subarray. The idea is the same for minimal sum.

The first key insight to understand Kadane’s algorithm is that, as we proceed to go from the start of the array to the end, the maximal sum cannot decrease (and vice versa for minimal sum). This naturally suggests an algorithm which continually compares the highest subarray sum seen thus far with a “new” subarray sum, which is unveiled at each step as we go from the start to the end of the array.

The question therefore is, what is the “new” sum that we need to compare the highest seen sum so far with?

To answer this question requires another key insight. Consider the question: Which new subarrays are introduced into the array at each step?

Firstly, notice that x and y are related by the condition 0 <= x <= y. This means that for every value of y there are y x values that can be paired with it. Therefore, the subarrays can be ordered by the y value of the pair, so starting from y=0 we have:

y = 0, x = 0 i.e (0,0)

y = 1, x = 0, 1 i.e (0,1), (1,1)

y = 2, x = 0, 1, 2 i.e (0,2), (1,2), (2,2)

y = i, x = 0, 1, …, i

Therefore that when the index is increased by 1, by the same ordering (ordering pairs by the y value), we have:

y = 0, x = 0

y = i, x = 0, 1, …, i

y = i+1, x = 0, 1, …, i+1

The key insight here is that the values of x which are paired to any particular y is dependent ONLY on the y value, since it’s just the values which are <= y, which means for any y the number of x values paired to it is FIXED, i.e does not change with regards to anything else, such as the value of i changing. Since the pairs for values of y less than i+1 stay the same, then when i increments by 1, the only new pairs added are the pairs with y=i+1, i.e all the subarrays in the array which end at i+1. This is seen in the ordering, where only one new y value is added with its corresponding x values. Intuitively this is the observation that extending the index by 1 does not add any new subarrays that end before the index.

At each iteration we have the maximum subarray sum seen so far. When we move the index one position forward to i+1, we now have i+1 new subarrays to compare to the current maximum subarray sum. This means a quadratic number of comparisons in total, but we want a linear time algorithm. Can we reduce the number of comparisons per iteration? If we can reduce the number of comparisons per iteration to one, then we can do it. This can be done if we can obtain the maximum subarray sum of the subarrays which end at the new current index.

Is it possible to find, in a single step, the subarray with the maximum sum, out of all the subarrays which end at the current index? Sure, if we keep track of the maximum subarray sum that end at the current index at each iteration. How does that work?

At each index i, the subarrays which end at that index are {A[j:i] for all j <= i} :

A[0:i] denote sum as A0

A[1:i] denote sum as A1

A[i:i] denote sum as Ai

At index i+1, the subarrays which end at that index are:

A[0:i+1] denote sum as A’0 = A0 + A[i+1]

A[1:i+1] denote sum as A’1 = A1 + A[i+1]

A[i:i+1] denote sum as A’i = Ai + A[i+1]

A[i+1:i+1] denote sum as A'{i+1} = A[i+1]

Now, notice that to get from the sums of the subarrays which end at index i to the sums of the subarrays which end at index i+1, you simply add A[i+1] to the sum. Also, A[i+1] is itself a new subarray sum.

Therefore, if we keep track of max{sums of subarrays which end at i}. Recall that adding the same number to both sides of an inequality preserves it. The maximum subarray sum is a number s such that s >= every other subarray sum. Adding A[i+1] to both sides of the inequality doesn’t affect this inequality, which means that the subarray with the maximum sum of all subarrays ending at index i will still be the subarray with the maximum sum of those subarrays with A[i+1] added.

Notice that the list of all subarrays which end at index i+1 and the list of all subarrays which end at index i with the element A[i+1] added, differ only in a single subarray: the element A[i+1] itself. This means that in order to find the maximum subarray in the list of all subarrays which end at index i+1, we simply take the maximum of the maximum subarray of all subarrays which end at index i with the element A[i+1] added, and the subarray consisting only of the element A[i+1].

Now that we can compute both the maximum of the subarrays which end at the current index, as well as the maximum of all the subarrays in the array up to the current index, at each iteration of the loop going through the array, we have solved the problem in linear time. The overall algorithm is thus simply:

  1. Set cur and result to 0 or A[0] depending on if you want to allow zero-length subarrays.
  2. For each index i in the array from indices 0 or 1 to n (0 if you want to allow zero-length subarrays, 1 if you don’t), do:
    1. cur = max(cur+A[i], A[i])
    2. result = max(result, cur)
  3. return result

My implementation in Python

Implementations in Why3 with postconditions and loop invariants (use Z3 4.4.1 to prove)

Relevant Why3 lecture slides

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s