Sliding Window: Minimal Window of Array

Sliding Window

Sliding window is a common seen technique in solving algorithm, it has a few elements:

  • Window start and end
  • Status of elements in the window
  • Window extends to the right according to the status
  • Window shrinks to the right according to the status
  • Window stops moving at certain condition.

sliding-window-page-1

Before we discuss sliding window algorithm, we must understand:

  • Any problem can be solved by sliding window can be solved by enumerating all possible combinations
  • Sliding window skipped some of the combinations to speed up

So the key to sliding window application is to prove:

No need to check the skipped combinations.

Let’s take a look at two questions and their solution and then come back to the discussion of sliding window.

Minimum Size Subarray Sum problem

Given an array of n positive integers and a positive integer s, find the minimal length of a contiguous subarray of which the sum ≥ s. If there isn’t one, return 0 instead.

For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.

Observations

From the description we have the following observations:

  • The elments are positive: sum of window increase/decrease as we expand/shrink window
  • The condition is sum >= s: when sum >= s we should expand, then shrink
  • Target at minimum subarray size: then have to shrink then window as much as possible

Looks like sliding window algorithm can be applied to this problem, but we have to answer the following questions:

What combinations are skipped?

We may simply the example above: [2,3,1,2] and s = 7
The following are combinations that will be visited:

  • [2]
  • [2, 3]
  • [2, 3, 1]
  • [2, 3, 1, 2]
  • [3, 1, 2]

The combinations that are skipped are:

  • [3, 1]
  • [1]
  • [1, 2]
  • [2]

And the [3, 1] was skipped when we shrink the window at [2, 3, 1, 2].

Is it correct to skip these combinations?

Consider about the target: sum >= s.

  • If [3, 1] >= s, then when the window grows from [2, 3] to [2, 3, 1] it will stop there because sum([2, 3, 1]) > sum([3, 1]) > 2
  • If it stopped, [3, 1] will be visited
  • But since it didn’t stop, [3, 1] is < s, no need to visit.

To make this prove more generic:

  • Assume the window stopped at [i, j]
  • Then j is the first element from i to n that makes sum(i, j) >= s
  • In another way, any sub window [i, k] where k < i will make sum(i, k) < s, so these sub window can be skipped.

Algorithm

Based on the discussion above, we come to the following algorithm:

while right < n:
    sum += num[right ++]
    while (sum > s):
        minLen = Math.min(minLen, right - left)
        sum -= num[left++]

Notice how the important principle we discussed in the beginning are applied in this algorithm:

  • Window: Use left and right boundary to represent a window
  • Stop condition: right boundary stops at the end
  • Status update:update status of window every time the window moves
  • Move condition: expand the window when the status did not mets requirement
  • Move condition: Shrink the window while the status meets requirements

Implementation

The following is the implementation of the algorithm:

public int minSubArrayLen(int s, int[] nums) {
    if (nums == null || nums.length == 0) {
        return 0;
    }
    int windowSum = 0;
    int minLen = nums.length + 1;
    for (int left = 0, right = 0; right < nums.length; right ++) {
        windowSum += nums[right];
        while (windowSum >= s) {
            minLen = Math.min(minLen, right - left + 1);
            windowSum -= nums[left ++];
        }
    }
    return minLen == nums.length + 1 ? 0 : minLen;
}

Minimum Window Substring problem

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".

Note:
If there is no such window in S that covers all characters in T, return the empty string “”.

If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.

Observations

We have the following observation about the problem:

  • Target at finding the minimal window that contains all characters in T
  • Sliding window characters can be applied to it: expand the window will increase the occurrence of certain character.

What combinations are skipped?

Let’s also simply the given examble to ADBEC, the following are combinations visited:

  • [A]
  • [A, D]
  • [A, D, B]
  • [A, D, B, E]
  • [A, D, B, E, C]
  • [D, B, E, C]

The combinations that are skipped are:

  • [D, B]
  • [D, B, E]
  • [B]
  • [B]
  • [B, E]
  • [B, E, C]

[D, B, E] are skipped when we shrink from [A, D, B, E, C] to [D, B, E, C].

Is it correct to skip these combinations?

The correctness can be proved similar to the process in the above question:

Let’s assume at window [i, j] it meets the targeting requirement, then the implication is that it does not meet the requirement at any sub window [i, k] where k < j. When any combination of these sub windows can be safely skipped.

Algorithm

The key is to find a way to:

  • Record the status of the window

Use a char map to record the occurrence of each character in the window.

  • Indication of the window meets requirement

This indication must be able to grow or decrease incrementally. A special count works

  • The count increase only when stats[ch] < target[ch], decrease when stats[ch] < target[ch]

where stats[ch] is the occurrence of ch in the window and target[ch] is the target occurrence of ch.

Implementation

The algorithm is very similar to the question above, the implementation is as below:

public String minWindow(String s, String t) {
    char[] tCh = new char[128];
    char[] sCh = new char[128];

    for (char ch : t.toCharArray()) {
        tCh[ch] ++;
    }

    String ret = "";
    int count = 0;
    int minWindowSize = Integer.MAX_VALUE;

    for (int left = 0, right = 0; right < s.length();) {
        char chRight = s.charAt(right ++);
        if (sCh[chRight]++ <= tCh[chRight]) {
            count ++;
        }

        while (count == t.length()) {
            if (right - left < minWindowSize) {
                minWindowSize = right - left;
                ret = s.substring(left, right);
            }

            char chLeft = s.charAt(left++);
            if (sCh[chLeft]-- < tCh[chLeft]) {
                count --;
            }
        }
    }

    return minWindowSize == Integer.MAX_VALUE ? "" : ret;
}

Sliding window application principle

Sliding window characteristics:

  • Sliding window has a start and end
  • Sliding window has status that indicate the status of the window
  • Sliding window grows/shrink toward one direction

Sliding window application principles:

  • Sliding window skipped some combinations to improve efficiency, to use sliding window needs to prove these combinations can be skipped.

In terms of implementation, here is the key ideas:

  • Use nested loop to move window: outer loop to expand the window, inner loop to shrink the window
  • The outer loop stop condition is the right boundary meets array end
  • The inner loop stop condition is the status of the window meets requirement, as long as it still meets requirement, shrink the window as possible.

Questions and follow ups:

  • If we change the question to Maximum Size subarray, does the sliding window algorithm still work?

Maximum size subarray does not need to apply sliding window: find out the sum of the entire array, if it meets requirements, it is the max window, otherwise no sub window works.

Maximum Size Subarray Sum Equals K

Maximum Size subarray problem

Given an array nums and a target value k, find the maximum length of a subarray that sums to k. If there isn’t one, return 0 instead.

Note:
The sum of the entire nums array is guaranteed to fit within the 32-bit signed integer range.

Example 1:
Given nums = [1, -1, 5, -2, 3], k = 3,
return 4. (because the subarray [1, -1, 5, -2] sums to 3 and is the longest)

Example 2:
Given nums = [-2, -1, 2, 1], k = 1,
return 2. (because the subarray [-1, 2] sums to 1 and is the longest)

Algorithm

The brute force solution is straightforward, try all possible combinations:

for index i ~ (0, n):
    for index j ~ (i, n):
         if sum(i, j) == k
            minLen = min(minLen, j - i + 1);

This algorithm takes O(n ^ 2) time, which is too slow.

The second solution I thought of was a sliding window algorithm.

while (right < boundary):
    sum += nums[right++]
    while (sum == target):
        minLen = min(minLen, right - left)
        sum -= nums[left++]

Sliding window does not work for this problem because:

  • Skipped part of the window may also contribute to a maxSize subarray.
  • Move right does not make the sum bigger, move left does not make the sum smaller. because number can be positive or negative.

A better solution is based on the following idea:

  • sum(i, j) = sum(0, j) – sum(i)
  • sum(0, i), sum(0, j) can be computed incrementally

The following diagram better illustrate the idea:
minimal-size-subarray-page-1So we have the following algorithm:

for i ~ (0, n):
    sum[i] = sum[i - 1] + num[i]
    gap = sum[i] - target
    if gap in map:
        maxWindow = Math.max(maxWindow, i - map(gap))

    if sum[i] not in map:
        map.put(sum[i], i)

Implementation

The following code implemented the algorithm above:

public int maxSubArrayLen(int[] nums, int k) {
    if (nums == null || nums.length == 0) {
        return 0;
    }

    int sum = 0, maxLen = 0;
    Map<Integer, Integer> sumIndex = new HashMap<Integer, Integer>();
    for (int i = 0; i < nums.length; i++) {
        sum += nums[i];
        if (sum == k) {
            maxLen = i + 1;
        } else if (sumIndex.containsKey(sum - k)) {
            maxLen = Math.max(maxLen, i - sumIndex.get(sum - k));
        }
        if (!sumIndex.containsKey(sum)) {
            sumIndex.put(sum, i);
        } // not update index because we want max sum
    }
    return maxLen;
}
  • Runtime: O(n)

Maximum Product Subarray

Maximum Product Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest product.

For example, given the array [2,3,-2,4],
the contiguous subarray [2,3] has the largest product = 6.

Reduction Transition Function

Product subarray is different from sum subarray in that:

  • A negative number * a negative number produce a big positive number
  • A negative number * a positive number produce a small negative number
  • Thus to get the localMax[i + 1], we may need to know the localMin[i]
  • So we want to record the locaMax and localMin at each step.

Thus we have the following RTF:

  • if nums[i + 1] >= 0:
    • localMax[i + 1] = Math.max(localMax[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMin[i] * nums[i + 1], nums[i]);
  • if nums[i + 1] < 0:
    • localMax[i + 1] = Math.max(localMin[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMax[i] * nums[i + 1], nums[i]);

The following diagram illustrates the algorithm:

Max product subarray - Page 1.png

Implementation

The following is the implementation of the algorithm:

public int maxProduct(int[] nums) {
    int localMax = nums[0];
    int localMin = nums[0];
    int max = nums[0];
    for (int i = 1; i < nums.length; i++) {
        if (nums[i] >= 0) {
            localMax = Math.max(nums[i], nums[i] * localMax);
            localMin = Math.min(nums[i], nums[i] * localMin);
        } else if (nums[i] < 0) {
            int localMaxTemp = Math.max(nums[i], nums[i] * localMin);
            localMin = Math.min(nums[i], nums[i] * localMax);
            localMax = localMaxTemp;
        }

        max = Math.max(localMax, max);
    }
    return max;
}
  • Runtime: O(n)

Maximum Subarray

Maximum Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.

For example, given the array [-2,1,-3,4,-1,2,1,-5,4],
the contiguous subarray [4,-1,2,1] has the largest sum = 6.

Reduction Transition Function

Consider the question in the following way:

  • Assume max subarray ending at nums[i] is nums[k, i]
  • What about max subarray ending at nums[i + 1]

There are two options:

  • Add nums[i + 1] into the max subarray ending at nums[i], which is nums[k, i]
  • Begin anew, max subarray ending at nums[i + 1] is nums[i + 1]

We don’t need to consider adding part of nums[k, i] with nums[i + 1]. Why? Because:

  • If sum(nums[k, i]) > 0, we want to add all of it
  • If sum(nums[k, i]) < 0, we don’t want to add any of the element.

Thus we have the following RTF:

localMax[i + 1] = localMax[i] > 0 ? local_max[i] + nums[i + 1] : nums[i + 1]

The following diagram illustrates whether to combine nums[i + 1] with nums[k, i] or to start anew:

Blank Diagram - Page 1-8.png

Then how does these localMax enable us to find the global max subarray? The following diagram illustrates the reason:

blank-diagram-page-1-9

LocalMax either extend or begin anew. Thus the array is segmented by all possible localMax, and the max of them is global max, which can be proved easily.

Implementation

The following is the implementation:

public int maxSubArray(int[] nums) {
    int max = Integer.MIN_VALUE, subMax = 0;
    for (int num : nums) {
        subMax = subMax > 0 ? subMax + num: num;
        max = Math.max(max, subMax);
    }
    return max;
}

Largest Rectangle in Histogram

Largest Rectangle in Histogram Problem

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

Below is a histogram where width of each bar is 1, given height = [6, 5,8,6,2].largest-rectangle-in-histogram-page-1-copy

The largest rectangle is painted as blue, which has are = 20 unit.

Max rectangle in histogram

Very similar to what we’ve discussed on Dynamic Programming: Maximal Rectangle, the area of a rectangle is determined by three different dimensions:

  • Height
  • Left boundary/right boundary.

As shown on the image below:
largest-rectangle-in-histogram-page-1-3

  • Bar i can form a rectangle with the height of heights[i]
  • Area of this rectangle is determined by left boundary and right boundary
  • Max rectangle can be found from one of the rectangles

Left boundary and right boundary

The key question is:

How to find left/right boundary?

By definition:

  • left_boundary[i] = j, when height[j, i] >= height[i] and height[j – 1] < height[i]
  • right_boundary[i] = k, when height[i, k] >= height[i] and height[k + 1] < height[i]

Which means, left boundary is the left most index where the bars between it and the current bar are no lower than current bar, while the one left to the left boundary is shorter than current bar.

One simple way to find left/right boundary is to search from index i until you find the first bar lower than height[i], but this solution takes O(n ^ 2), which is too expensive.

To find left/right boundary efficiently, consider the following observation:

  • If heights[i] > heights[i – 1], then left(i) = i
  • If heights[i] <= heights[i – 1], then left(i) = left(i – 1)
  • If heights[i] > heights[i + 1], then right(i) = i
  • If heights[i] <= heights[i + 1], then right(i) = right(i + 1)

Combine the latter two with the first two, we got the following algorithm of computing the leftBoundary and rightBoundary:

  • Create empty stack
  • For each bar 0 ~ n – 1:
    • If stack.isEmpty() or heights[i] >= height[stack.peek]
      • stack.push(i)
    • If !stack.isEmpty() and heights[i] < heights[stack.peek]
      • left(stack.peek) = stack.peek, right(stack.peek) = i
    • repeat until stack empty or heights[i] >= heights[stack.peek]

The following example illustrated how the above algorithm works:

largest-rectangle-in-histogram-page-1

Implementation

The following code implements the algorithm above.

public int largestRectangleArea(int[] heights) {
    if (heights == null || heights.length == 0) {
        return 0;
    }

    int maxArea = 0;
    int len = heights.length;
    Stack stack = new Stack ();
    for (int i = 0; i <= heights.length; i++) {         while (!stack.isEmpty() && (i == len || heights[stack.peek()] > heights[i])) {
            int height = heights[stack.pop()];
            int width = stack.isEmpty() ? i : i - stack.peek() - 1;
            maxArea = Math.max(maxArea, height * width);
        }
        if (i != len) {
            stack.push(i);
        }
    }
    return maxArea;
}
  • run time: O(n)

A Wrong DP Solution

I once came to a wrong DP solution for this question:

  • for bar 1 ~ n:
    • left[i] = height[i] > height[i – 1] ? i : left[i – 1]
  • for bar n – 1: 0:
    • right[i] = height[i] > height[i + 1] ? i : right[i + 1]

The following diagram illustrated why it is incorrect:Largest Rectangle in Histogram - Page 1-3 copy.png

The thing I learned was: always be careful about DP solutions : )

Dynamic Programming: Maximal Rectangle

Maximal Rectangle problem

Given a 2D binary matrix filled with 0’s and 1’s, find the largest rectangle containing only 1’s and return its area.

For example, given the following matrix:


1 0 1 0 0
1 0 1 1 1
1 1 1 1 1
1 0 0 1 0

Return 6.

Height, left boundary and right boundary

A rectangle within a matrix can be decided by three dimensions:

  • Height of the rectangle: counts the number of successive ‘1’s above (plus the current one)
  • left/right boundary: the boundaries of the rectangle which contains the current point with a height of value height

If you look at the following square, you will find that we can find three different square:
maximal-rectangle-page-1

  • rectangle 1: height 2, left boundary index 1, right boundary index 2
  • rectangle 2: height 3, left boundary index 2, right boundary index 2
  • rectangle 3: height 1, left boundary index 1, right boundary index 3

Correctness

The question is:

How does this algorithm guarantees find maximal rectangle ending at bottom row.

If we only take a look at the example, it is based on the following truth:

  • Rectangle 1 is the largest rectangle with height of 2
  • Rectangle 2 is the largest rectangle with height of 3
  • rectangle 3 is the largest rectangle with height of 1
    Then the rectangle with the largest area must be in one of them.

Reduction Transition Function

The next question is:

what will happen if we add another row?

In the following diagram, one more row is added.
maximal-rectangle-page-1-2

Three new square if formed when this new row added, the height/left boundary/right boundary have to be recalculated.

Height RTF:

  • height[i][j] = matrix[i][j] == ‘1’ ? height[i – 1][j] + 1 : 0

Left boundary RTF:

  • currleft = matrix[i][j] == ‘1’ ? currleft : j + 1
  • left[i][j] = matrix[i][j] == ‘1’ ? max(left[i – 1][j], curr_left) : 0

The following diagram illustrate how this transition works:
maximal-rectangle-page-1-3

  • curr_left = 0
  • matrix[3][1] = ‘0’ => curr_left = j + 1 = 2
  • matrix[3][2] = ‘1’ => left[1] = max(left[3], curr_left) = 2

We need to update curr_left because when previous cell is ‘0’, previous column could not be counted, hast to restart.

Let’s change matrix[3][1] to ‘1’:
maximal-rectangle-page-1-4

  • curr_left = 0
  • matrix[3][1] = ‘1’, left[2] = max(0, left[2]) = 1
  • matrix[3][2] = ‘1’, left[3] = max(1, left[3]) = 1

When previous cell is ‘1’, previous column could sill be counted, no need to restart.

Right boundary RTF:

  • currright = matrix[i][j] == ‘1’ ? currright : j
  • right[i][j] = matrix[i][j] == ‘1’ ? min(right[i – 1][j], curr_right) : n;

This is the same as we construct left boundary.

Implementation

We can use a rolling array to replace the matrix. The following is the implementation of the matrix.

public int maximalRectangle(char[][] matrix) {
    if (matrix == null || matrix.length == 0 || matrix[0].length == 0) {
        return 0;
    }

    int nrow = matrix.length;
    int ncol = matrix[0].length;

    int[] height = new int[ncol];
    int[] left = new int[ncol];
    int[] right = new int[ncol];
    Arrays.fill(right, ncol);
    int maxArea = 0;

    for (int i = 0; i < nrow; i++) {
        int currLeft = 0, currRight = ncol;
        // calculate height
        for (int j = 0; j < ncol; j++) {
            height[j] = matrix[i][j] == '1' ? height[j] + 1 : 0;
        }

        // calculate left boundary: left to right
        for (int j = 0; j = 0; j--) {
            if (matrix[i][j] == '1') {
                right[j] = Math.min(right[j], currRight);
            } else {
                right[j] = ncol;
                currRight = j;
            }
        }

        // calculate max area: right bounday does not include valid column
        for (int j = 0; j < ncol; j++) {
            maxArea = Math.max((right[j] - left[j]) * height[j], maxArea);
        }
    }

    return maxArea;
}

Dynamic Programming: Regular Expression Matching

Regular Expression Matching Problem

Implement regular expression matching with support for ‘.’ and ‘*’.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

Reduction Transition Function

B[j] is a regular character

match[i + 1][j + 1] = A[i] == B[j] && match[i][j]

dynamic-programming-regular-expression-matching-page-1

B[j] is ‘.’

match[i + 1][j + 1] = A[i] == B[j] && match[i][j]

dynamic-programming-regular-expression-matching-page-1-copy

B[j] is ‘*’

There are three different cases:

dynamic-programming-regular-expression-matching-page-1-1

B[j] matches zero preceding element

match[i + 1][j + 1] = match[i + 1][j – 1]

B[j] matches one preceding element

match[i + 1][j + 1] = (B[j – 1] == ‘.’ || B[j – 1] == A[j]) && match[i + 1][j]

The preceding element has to be ‘.’ or the same as B[j].

B[j] matches multiple preceding element

match[i + 1][j + 1] = (B[j – 1] == ‘.’ || B[j – 1] == A[j]) && match[i][j + 1]

When matching with multiple elements, preceding element has to be ‘.’ or the same as B[j].

Implementation

The following diagram illustrate how this algorithm works with an matrix:

dynamic-programming-regular-expression-scan-matrix-page-1

Initiate first row

How to get the correct value for the first row? match[0, i] means: whether B[0, i] matches an empty string. Since only * can match with zero string, this question is another DP solution, the RTF is:

match[0][i + 1] = (B[i] == ‘*’ && match[0][i – 1])

Where match[0][i – 1] indicates whether B[0, i – 2] matches empty string.

The following is the implementation of the algorithm:

public boolean isMatch(String s, String p) {
        int len1 = s.length();
        int len2 = p.length();

        boolean[][] match = new boolean[len1 + 1][len2 + 1];
        match[0][0] = true;
        for (int i = 0; i < len2; i++) {
            if (p.charAt(i) == '*' && match[0][i - 1]) {
                match[0][i + 1] = true;
            }
        }

        for (int i = 0; i < len1; i++) {
            for (int j = 0; j < len2; j++) {
                // p[j] is '.'.
                if (p.charAt(j) == '.') {
                    match[i + 1][j + 1] = match[i][j];
                }

                  // P[j] is a regular char.
                if (p.charAt(j) == s.charAt(i)) {
                    match[i + 1][j + 1] = match[i][j];
                }

                if (p.charAt(j) == '*') {
                    // preceding char does not match, count as empty.
                    if (p.charAt(j - 1) != '.' && p.charAt(j - 1) != s.charAt(i)) {
                        match[i + 1][j + 1] = match[i + 1][j - 1];
                    } else {
                        // match[i + 1][j]: match one char.
                        // match[i][j + 1]: match multiple char.
                        // match[i + 1][j - 1]: match empty char.
                        match[i + 1][j + 1] = match[i + 1][j] || match[i][j + 1] || match[i + 1][j - 1];
                    }
                }
            }
        }
        return match[len1][len2];
    }