Machine Learning: Simple Linear Regression

Let’s first take a look at simple linear regression.

First let’s start with an open question: imagine you can get all the information you need, how do you predict the selling price of a house?

I believe you might consider the following factors:

  • House size
  • House location
  • House making year
  • Location of the house
  • Other features.

General machine learning steps

If we can the these information of many house, then we are able to predict the selling price of this house. A simple linear regression is doing exactly the same thing: it take the feature information of existing house as input, then try to predict the prices of another house.

Then there are a few general steps for a prediction work that are applied in the above process, including:

  • Feature engineering: trying to find or create features that make machine learning algorithm work.
  • Model selection: try to find the best model that can interpret the given question.
  • Training: once you get the model, you need to train the model to make sure the model is customized for your problem
  • Testing: you also want to test whether the given model over customized or not by testing it with data sets beyond training data.
  • Prediction: use the given model, make prediction for the problem you want to talk about.

Let’s look back to the house price prediction issue. Assume we only consider one feature: square feet of the house. And we may assume the function of size to price is y = ax + b.
The next question is how to determine: a and b.

Cost function

One principle or standard that help us determine a and b is that:
If we define the error of prediction as (yi - f(Xi))^2, then for all training data set, a and b should minimize SUM(yi - f(xi)). Then we call (yi - f(xi))^2 cost function. It is used to gauge the cost of using a particular model.

Then we come the following questions:

  • Of what value should a and b be initiated?
  • How should we change a and b after we now the cost of using existing a and b?
  • How do we know whether we can find the optimal a and b?
    To answer the first question: it does not matter what the initial value of a and b we choose.

Gradient descent

For the second question: how should we move to reach the optimal destination?
We want to know the concept of gradient: the gradient point in the direction of greatest rate of increase of the function. Which means, we are able to find the min or max of a function by following the gradient direction. For example, let’s say we have a single variable convex/concave function. For concave function, we can find the max through hill climbing; while for concave function, we can find the minimal value through hill descent. For multi variable functions, it is essentially the same, so we are able to find the min and max as well.

When we do hill climbing or hill descent, one thing we need to consider is the step size: where is the next step if we did not find the max/min value.

There are two approaches for step size choose:

  • Fixed size
  • Decreasing step size.

For the third questions: when should we stop moving? In theory, when converged, gradient should be zero, while in practice, the move stops when the gradient is less than the threshold.

Then it becomes a math question:

  • How do we compute gradient for multiple variable function?
  • How do we compute gradient for multiple dimension vectors?

We will talk about all the general steps mentioned and un-mentioned later, including the detail of gradient descent.

Sliding Window: Minimal Window of Array

Sliding Window

Sliding window is a common seen technique in solving algorithm, it has a few elements:

  • Window start and end
  • Status of elements in the window
  • Window extends to the right according to the status
  • Window shrinks to the right according to the status
  • Window stops moving at certain condition.


Before we discuss sliding window algorithm, we must understand:

  • Any problem can be solved by sliding window can be solved by enumerating all possible combinations
  • Sliding window skipped some of the combinations to speed up

So the key to sliding window application is to prove:

No need to check the skipped combinations.

Let’s take a look at two questions and their solution and then come back to the discussion of sliding window.

Minimum Size Subarray Sum problem

Given an array of n positive integers and a positive integer s, find the minimal length of a contiguous subarray of which the sum ≥ s. If there isn’t one, return 0 instead.

For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.


From the description we have the following observations:

  • The elments are positive: sum of window increase/decrease as we expand/shrink window
  • The condition is sum >= s: when sum >= s we should expand, then shrink
  • Target at minimum subarray size: then have to shrink then window as much as possible

Looks like sliding window algorithm can be applied to this problem, but we have to answer the following questions:

What combinations are skipped?

We may simply the example above: [2,3,1,2] and s = 7
The following are combinations that will be visited:

  • [2]
  • [2, 3]
  • [2, 3, 1]
  • [2, 3, 1, 2]
  • [3, 1, 2]

The combinations that are skipped are:

  • [3, 1]
  • [1]
  • [1, 2]
  • [2]

And the [3, 1] was skipped when we shrink the window at [2, 3, 1, 2].

Is it correct to skip these combinations?

Consider about the target: sum >= s.

  • If [3, 1] >= s, then when the window grows from [2, 3] to [2, 3, 1] it will stop there because sum([2, 3, 1]) > sum([3, 1]) > 2
  • If it stopped, [3, 1] will be visited
  • But since it didn’t stop, [3, 1] is < s, no need to visit.

To make this prove more generic:

  • Assume the window stopped at [i, j]
  • Then j is the first element from i to n that makes sum(i, j) >= s
  • In another way, any sub window [i, k] where k < i will make sum(i, k) < s, so these sub window can be skipped.


Based on the discussion above, we come to the following algorithm:

while right < n:
    sum += num[right ++]
    while (sum > s):
        minLen = Math.min(minLen, right - left)
        sum -= num[left++]

Notice how the important principle we discussed in the beginning are applied in this algorithm:

  • Window: Use left and right boundary to represent a window
  • Stop condition: right boundary stops at the end
  • Status update:update status of window every time the window moves
  • Move condition: expand the window when the status did not mets requirement
  • Move condition: Shrink the window while the status meets requirements


The following is the implementation of the algorithm:

public int minSubArrayLen(int s, int[] nums) {
    if (nums == null || nums.length == 0) {
        return 0;
    int windowSum = 0;
    int minLen = nums.length + 1;
    for (int left = 0, right = 0; right < nums.length; right ++) {
        windowSum += nums[right];
        while (windowSum >= s) {
            minLen = Math.min(minLen, right - left + 1);
            windowSum -= nums[left ++];
    return minLen == nums.length + 1 ? 0 : minLen;

Minimum Window Substring problem

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example,
T = "ABC"
Minimum window is "BANC".

If there is no such window in S that covers all characters in T, return the empty string “”.

If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.


We have the following observation about the problem:

  • Target at finding the minimal window that contains all characters in T
  • Sliding window characters can be applied to it: expand the window will increase the occurrence of certain character.

What combinations are skipped?

Let’s also simply the given examble to ADBEC, the following are combinations visited:

  • [A]
  • [A, D]
  • [A, D, B]
  • [A, D, B, E]
  • [A, D, B, E, C]
  • [D, B, E, C]

The combinations that are skipped are:

  • [D, B]
  • [D, B, E]
  • [B]
  • [B]
  • [B, E]
  • [B, E, C]

[D, B, E] are skipped when we shrink from [A, D, B, E, C] to [D, B, E, C].

Is it correct to skip these combinations?

The correctness can be proved similar to the process in the above question:

Let’s assume at window [i, j] it meets the targeting requirement, then the implication is that it does not meet the requirement at any sub window [i, k] where k < j. When any combination of these sub windows can be safely skipped.


The key is to find a way to:

  • Record the status of the window

Use a char map to record the occurrence of each character in the window.

  • Indication of the window meets requirement

This indication must be able to grow or decrease incrementally. A special count works

  • The count increase only when stats[ch] < target[ch], decrease when stats[ch] < target[ch]

where stats[ch] is the occurrence of ch in the window and target[ch] is the target occurrence of ch.


The algorithm is very similar to the question above, the implementation is as below:

public String minWindow(String s, String t) {
    char[] tCh = new char[128];
    char[] sCh = new char[128];

    for (char ch : t.toCharArray()) {
        tCh[ch] ++;

    String ret = "";
    int count = 0;
    int minWindowSize = Integer.MAX_VALUE;

    for (int left = 0, right = 0; right < s.length();) {
        char chRight = s.charAt(right ++);
        if (sCh[chRight]++ <= tCh[chRight]) {
            count ++;

        while (count == t.length()) {
            if (right - left < minWindowSize) {
                minWindowSize = right - left;
                ret = s.substring(left, right);

            char chLeft = s.charAt(left++);
            if (sCh[chLeft]-- < tCh[chLeft]) {
                count --;

    return minWindowSize == Integer.MAX_VALUE ? "" : ret;

Sliding window application principle

Sliding window characteristics:

  • Sliding window has a start and end
  • Sliding window has status that indicate the status of the window
  • Sliding window grows/shrink toward one direction

Sliding window application principles:

  • Sliding window skipped some combinations to improve efficiency, to use sliding window needs to prove these combinations can be skipped.

In terms of implementation, here is the key ideas:

  • Use nested loop to move window: outer loop to expand the window, inner loop to shrink the window
  • The outer loop stop condition is the right boundary meets array end
  • The inner loop stop condition is the status of the window meets requirement, as long as it still meets requirement, shrink the window as possible.

Questions and follow ups:

  • If we change the question to Maximum Size subarray, does the sliding window algorithm still work?

Maximum size subarray does not need to apply sliding window: find out the sum of the entire array, if it meets requirements, it is the max window, otherwise no sub window works.

Maximum Size Subarray Sum Equals K

Maximum Size subarray problem

Given an array nums and a target value k, find the maximum length of a subarray that sums to k. If there isn’t one, return 0 instead.

The sum of the entire nums array is guaranteed to fit within the 32-bit signed integer range.

Example 1:
Given nums = [1, -1, 5, -2, 3], k = 3,
return 4. (because the subarray [1, -1, 5, -2] sums to 3 and is the longest)

Example 2:
Given nums = [-2, -1, 2, 1], k = 1,
return 2. (because the subarray [-1, 2] sums to 1 and is the longest)


The brute force solution is straightforward, try all possible combinations:

for index i ~ (0, n):
    for index j ~ (i, n):
         if sum(i, j) == k
            minLen = min(minLen, j - i + 1);

This algorithm takes O(n ^ 2) time, which is too slow.

The second solution I thought of was a sliding window algorithm.

while (right < boundary):
    sum += nums[right++]
    while (sum == target):
        minLen = min(minLen, right - left)
        sum -= nums[left++]

Sliding window does not work for this problem because:

  • Skipped part of the window may also contribute to a maxSize subarray.
  • Move right does not make the sum bigger, move left does not make the sum smaller. because number can be positive or negative.

A better solution is based on the following idea:

  • sum(i, j) = sum(0, j) – sum(i)
  • sum(0, i), sum(0, j) can be computed incrementally

The following diagram better illustrate the idea:
minimal-size-subarray-page-1So we have the following algorithm:

for i ~ (0, n):
    sum[i] = sum[i - 1] + num[i]
    gap = sum[i] - target
    if gap in map:
        maxWindow = Math.max(maxWindow, i - map(gap))

    if sum[i] not in map:
        map.put(sum[i], i)


The following code implemented the algorithm above:

public int maxSubArrayLen(int[] nums, int k) {
    if (nums == null || nums.length == 0) {
        return 0;

    int sum = 0, maxLen = 0;
    Map<Integer, Integer> sumIndex = new HashMap<Integer, Integer>();
    for (int i = 0; i < nums.length; i++) {
        sum += nums[i];
        if (sum == k) {
            maxLen = i + 1;
        } else if (sumIndex.containsKey(sum - k)) {
            maxLen = Math.max(maxLen, i - sumIndex.get(sum - k));
        if (!sumIndex.containsKey(sum)) {
            sumIndex.put(sum, i);
        } // not update index because we want max sum
    return maxLen;
  • Runtime: O(n)

Maximum Product Subarray

Maximum Product Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest product.

For example, given the array [2,3,-2,4],
the contiguous subarray [2,3] has the largest product = 6.

Reduction Transition Function

Product subarray is different from sum subarray in that:

  • A negative number * a negative number produce a big positive number
  • A negative number * a positive number produce a small negative number
  • Thus to get the localMax[i + 1], we may need to know the localMin[i]
  • So we want to record the locaMax and localMin at each step.

Thus we have the following RTF:

  • if nums[i + 1] >= 0:
    • localMax[i + 1] = Math.max(localMax[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMin[i] * nums[i + 1], nums[i]);
  • if nums[i + 1] < 0:
    • localMax[i + 1] = Math.max(localMin[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMax[i] * nums[i + 1], nums[i]);

The following diagram illustrates the algorithm:

Max product subarray - Page 1.png


The following is the implementation of the algorithm:

public int maxProduct(int[] nums) {
    int localMax = nums[0];
    int localMin = nums[0];
    int max = nums[0];
    for (int i = 1; i < nums.length; i++) {
        if (nums[i] >= 0) {
            localMax = Math.max(nums[i], nums[i] * localMax);
            localMin = Math.min(nums[i], nums[i] * localMin);
        } else if (nums[i] < 0) {
            int localMaxTemp = Math.max(nums[i], nums[i] * localMin);
            localMin = Math.min(nums[i], nums[i] * localMax);
            localMax = localMaxTemp;

        max = Math.max(localMax, max);
    return max;
  • Runtime: O(n)

Maximum Subarray

Maximum Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.

For example, given the array [-2,1,-3,4,-1,2,1,-5,4],
the contiguous subarray [4,-1,2,1] has the largest sum = 6.

Reduction Transition Function

Consider the question in the following way:

  • Assume max subarray ending at nums[i] is nums[k, i]
  • What about max subarray ending at nums[i + 1]

There are two options:

  • Add nums[i + 1] into the max subarray ending at nums[i], which is nums[k, i]
  • Begin anew, max subarray ending at nums[i + 1] is nums[i + 1]

We don’t need to consider adding part of nums[k, i] with nums[i + 1]. Why? Because:

  • If sum(nums[k, i]) > 0, we want to add all of it
  • If sum(nums[k, i]) < 0, we don’t want to add any of the element.

Thus we have the following RTF:

localMax[i + 1] = localMax[i] > 0 ? local_max[i] + nums[i + 1] : nums[i + 1]

The following diagram illustrates whether to combine nums[i + 1] with nums[k, i] or to start anew:

Blank Diagram - Page 1-8.png

Then how does these localMax enable us to find the global max subarray? The following diagram illustrates the reason:


LocalMax either extend or begin anew. Thus the array is segmented by all possible localMax, and the max of them is global max, which can be proved easily.


The following is the implementation:

public int maxSubArray(int[] nums) {
    int max = Integer.MIN_VALUE, subMax = 0;
    for (int num : nums) {
        subMax = subMax > 0 ? subMax + num: num;
        max = Math.max(max, subMax);
    return max;

Largest Rectangle in Histogram

Largest Rectangle in Histogram Problem

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

Below is a histogram where width of each bar is 1, given height = [6, 5,8,6,2].largest-rectangle-in-histogram-page-1-copy

The largest rectangle is painted as blue, which has are = 20 unit.

Max rectangle in histogram

Very similar to what we’ve discussed on Dynamic Programming: Maximal Rectangle, the area of a rectangle is determined by three different dimensions:

  • Height
  • Left boundary/right boundary.

As shown on the image below:

  • Bar i can form a rectangle with the height of heights[i]
  • Area of this rectangle is determined by left boundary and right boundary
  • Max rectangle can be found from one of the rectangles

Left boundary and right boundary

The key question is:

How to find left/right boundary?

By definition:

  • left_boundary[i] = j, when height[j, i] >= height[i] and height[j – 1] < height[i]
  • right_boundary[i] = k, when height[i, k] >= height[i] and height[k + 1] < height[i]

Which means, left boundary is the left most index where the bars between it and the current bar are no lower than current bar, while the one left to the left boundary is shorter than current bar.

One simple way to find left/right boundary is to search from index i until you find the first bar lower than height[i], but this solution takes O(n ^ 2), which is too expensive.

To find left/right boundary efficiently, consider the following observation:

  • If heights[i] > heights[i – 1], then left(i) = i
  • If heights[i] <= heights[i – 1], then left(i) = left(i – 1)
  • If heights[i] > heights[i + 1], then right(i) = i
  • If heights[i] <= heights[i + 1], then right(i) = right(i + 1)

Combine the latter two with the first two, we got the following algorithm of computing the leftBoundary and rightBoundary:

  • Create empty stack
  • For each bar 0 ~ n – 1:
    • If stack.isEmpty() or heights[i] >= height[stack.peek]
      • stack.push(i)
    • If !stack.isEmpty() and heights[i] < heights[stack.peek]
      • left(stack.peek) = stack.peek, right(stack.peek) = i
    • repeat until stack empty or heights[i] >= heights[stack.peek]

The following example illustrated how the above algorithm works:



The following code implements the algorithm above.

public int largestRectangleArea(int[] heights) {
    if (heights == null || heights.length == 0) {
        return 0;

    int maxArea = 0;
    int len = heights.length;
    Stack stack = new Stack ();
    for (int i = 0; i <= heights.length; i++) {         while (!stack.isEmpty() && (i == len || heights[stack.peek()] > heights[i])) {
            int height = heights[stack.pop()];
            int width = stack.isEmpty() ? i : i - stack.peek() - 1;
            maxArea = Math.max(maxArea, height * width);
        if (i != len) {
    return maxArea;
  • run time: O(n)

A Wrong DP Solution

I once came to a wrong DP solution for this question:

  • for bar 1 ~ n:
    • left[i] = height[i] > height[i – 1] ? i : left[i – 1]
  • for bar n – 1: 0:
    • right[i] = height[i] > height[i + 1] ? i : right[i + 1]

The following diagram illustrated why it is incorrect:Largest Rectangle in Histogram - Page 1-3 copy.png

The thing I learned was: always be careful about DP solutions : )

Dynamic Programming: Maximal Rectangle

Maximal Rectangle problem

Given a 2D binary matrix filled with 0’s and 1’s, find the largest rectangle containing only 1’s and return its area.

For example, given the following matrix:

1 0 1 0 0
1 0 1 1 1
1 1 1 1 1
1 0 0 1 0

Return 6.

Height, left boundary and right boundary

A rectangle within a matrix can be decided by three dimensions:

  • Height of the rectangle: counts the number of successive ‘1’s above (plus the current one)
  • left/right boundary: the boundaries of the rectangle which contains the current point with a height of value height

If you look at the following square, you will find that we can find three different square:

  • rectangle 1: height 2, left boundary index 1, right boundary index 2
  • rectangle 2: height 3, left boundary index 2, right boundary index 2
  • rectangle 3: height 1, left boundary index 1, right boundary index 3


The question is:

How does this algorithm guarantees find maximal rectangle ending at bottom row.

If we only take a look at the example, it is based on the following truth:

  • Rectangle 1 is the largest rectangle with height of 2
  • Rectangle 2 is the largest rectangle with height of 3
  • rectangle 3 is the largest rectangle with height of 1
    Then the rectangle with the largest area must be in one of them.

Reduction Transition Function

The next question is:

what will happen if we add another row?

In the following diagram, one more row is added.

Three new square if formed when this new row added, the height/left boundary/right boundary have to be recalculated.

Height RTF:

  • height[i][j] = matrix[i][j] == ‘1’ ? height[i – 1][j] + 1 : 0

Left boundary RTF:

  • currleft = matrix[i][j] == ‘1’ ? currleft : j + 1
  • left[i][j] = matrix[i][j] == ‘1’ ? max(left[i – 1][j], curr_left) : 0

The following diagram illustrate how this transition works:

  • curr_left = 0
  • matrix[3][1] = ‘0’ => curr_left = j + 1 = 2
  • matrix[3][2] = ‘1’ => left[1] = max(left[3], curr_left) = 2

We need to update curr_left because when previous cell is ‘0’, previous column could not be counted, hast to restart.

Let’s change matrix[3][1] to ‘1’:

  • curr_left = 0
  • matrix[3][1] = ‘1’, left[2] = max(0, left[2]) = 1
  • matrix[3][2] = ‘1’, left[3] = max(1, left[3]) = 1

When previous cell is ‘1’, previous column could sill be counted, no need to restart.

Right boundary RTF:

  • currright = matrix[i][j] == ‘1’ ? currright : j
  • right[i][j] = matrix[i][j] == ‘1’ ? min(right[i – 1][j], curr_right) : n;

This is the same as we construct left boundary.


We can use a rolling array to replace the matrix. The following is the implementation of the matrix.

public int maximalRectangle(char[][] matrix) {
    if (matrix == null || matrix.length == 0 || matrix[0].length == 0) {
        return 0;

    int nrow = matrix.length;
    int ncol = matrix[0].length;

    int[] height = new int[ncol];
    int[] left = new int[ncol];
    int[] right = new int[ncol];
    Arrays.fill(right, ncol);
    int maxArea = 0;

    for (int i = 0; i < nrow; i++) {
        int currLeft = 0, currRight = ncol;
        // calculate height
        for (int j = 0; j < ncol; j++) {
            height[j] = matrix[i][j] == '1' ? height[j] + 1 : 0;

        // calculate left boundary: left to right
        for (int j = 0; j = 0; j--) {
            if (matrix[i][j] == '1') {
                right[j] = Math.min(right[j], currRight);
            } else {
                right[j] = ncol;
                currRight = j;

        // calculate max area: right bounday does not include valid column
        for (int j = 0; j < ncol; j++) {
            maxArea = Math.max((right[j] - left[j]) * height[j], maxArea);

    return maxArea;