税收的哲学:效率和平等

工作之后,我们所切身关心的问题包括:

  • 我们交了多少税?
  • 我们为什么要交这么多税?
  • 我们怎样才能少交税?

这篇文章里,我想要重点讨论第二个问题:我们为什么要交这么多税?

我们知道,为了维持公共机构和公共服务的运转,税收是必不可少的,“税收是我们为文明社会付出的代价”。根据社会契约的思想,每个人让渡出一部分的个人权利来组成一个政府,从而保护自己的财产等权利,税收可以视作是这让渡出的一部分权利。那么如何确定每个人应该缴纳税款的量呢?怎么样的税收制度才是有效率的?

税收的效率

每种税收的办法都有其成本。成本之一来自税收改变了人们的决策所引起的无谓损失,其二来自税收的管理成本。我们知道税收的存在改变了人们的行为。比如,如果对个人所得征收极高的所得税,那么人们的工作热情势必降低,因为税收等于降低了实际收入。税收的管理成本包括花在收税和缴税上的所有成本。比如要维持整个征税系统的运作所需要的成本,每个人每年报税所需要花费的成本等。

那么怎样降低税收的成本以提高税收的效率呢?其中一项策略是在不同的环节征税从而取得不同的效果,比如在收入环节征税会直接一直人们的工作热情,但是在消费环节征税则不会,从而有可能降低税收的无关损失。从税收的管理成本上看,简化税收的流程和设计,可以降低税收的管理成本。

税收的平等

我们都同意,税收制度应该是平等的,但是什么才算是平等的税收?怎么判断一项税收制度是否平等呢?这里有两种主要的思想。

首先是收益原则的思想,即根据从公共服务中获益的程度来决定税收的数量。税收的目的是为了提供公共服务,那么我们很自然的联想到,一个人从公共福利中得到多少好处,他就应该提供多少税收。基于这种思想我们有时可以推论出富人应该缴纳更多的税赋。比如评估财产保护这项公共服务,那么富有的人肯定会出更高的价格,因此我们可以认为在享受相同的财产保护的服务的时候,富人从中得到的好处更多。

另一种是能力原则的思想。这种思想认为,应该根据一个人的支付能力来纳税。也就是说,支付能力大的人付出的更多,支付能力形同的人付出的数额相同。前一种称为纵向平等,后一种称为横向平等。

但是,从个体的角度来说,收益原则是最直接的心理依据。比如,觉得自己交税太多的人往往是认为自己交税的成本远远大于享受到的社会福利,或者认为社会福利被转移支付到了较为穷困的人手上,是对自己的不公平。

如何衡量税收是否平等?

要衡量税收是否平等,我们首先需要知道税收的负担最重落在了谁的身上,也就是说,最后是谁来支付税收的实际成本?与我们的直觉相反,并不总是收到税单的人承担了税收的实际成本。比如,选民们总是渴望减少自己的税收,或者宁愿税收由非个人化的公司来支付。但实际上,虽然公司所得随不在员工的工资单上出现,但会导致公司的成本增加,反过来可能导致员工的福利降低,从而使税收的实际成本转移到了普通员工的身上。因此,衡量税收的负担要考虑到税收的间接效果,而不是通过直接比较税收的账单来得到。

权衡取舍

税收的效率和平等是有冲突的。比如出于平等的考虑,我们会对富人收更多的税,但这损害了富人们在经济上的积极性,从而带来更大的无谓损失。不同阶层出于自己不同的利益,也会推动税收制度向这两个方向变动,最典型的例子比如里根总统的减税计划。

税收的社会干预作用

除了作为提供公共服务的收入来源,税收作为一项政策工具还可以用于进行社会干预和管理。比如如果对储蓄利息征收高所得税,就等于降低了银行储蓄的实际利率,人们就会将更多的钱用于消费和投资。比如,对遗产征收高额度的税赋有利于降低财富的代季积累效应。也就是说,遗产税的存在使得后代较少的受益于父辈积累的社会资源,从而降低与同辈竞争时的优势,进而促进社会公平。当然,代价是牺牲了经济上的效率。

Sliding Window: Minimal Window of Array

Sliding Window

Sliding window is a common seen technique in solving algorithm, it has a few elements:

  • Window start and end
  • Status of elements in the window
  • Window extends to the right according to the status
  • Window shrinks to the right according to the status
  • Window stops moving at certain condition.

sliding-window-page-1

Before we discuss sliding window algorithm, we must understand:

  • Any problem can be solved by sliding window can be solved by enumerating all possible combinations
  • Sliding window skipped some of the combinations to speed up

So the key to sliding window application is to prove:

No need to check the skipped combinations.

Let’s take a look at two questions and their solution and then come back to the discussion of sliding window.

Minimum Size Subarray Sum problem

Given an array of n positive integers and a positive integer s, find the minimal length of a contiguous subarray of which the sum ≥ s. If there isn’t one, return 0 instead.

For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.

Observations

From the description we have the following observations:

  • The elments are positive: sum of window increase/decrease as we expand/shrink window
  • The condition is sum >= s: when sum >= s we should expand, then shrink
  • Target at minimum subarray size: then have to shrink then window as much as possible

Looks like sliding window algorithm can be applied to this problem, but we have to answer the following questions:

What combinations are skipped?

We may simply the example above: [2,3,1,2] and s = 7
The following are combinations that will be visited:

  • [2]
  • [2, 3]
  • [2, 3, 1]
  • [2, 3, 1, 2]
  • [3, 1, 2]

The combinations that are skipped are:

  • [3, 1]
  • [1]
  • [1, 2]
  • [2]

And the [3, 1] was skipped when we shrink the window at [2, 3, 1, 2].

Is it correct to skip these combinations?

Consider about the target: sum >= s.

  • If [3, 1] >= s, then when the window grows from [2, 3] to [2, 3, 1] it will stop there because sum([2, 3, 1]) > sum([3, 1]) > 2
  • If it stopped, [3, 1] will be visited
  • But since it didn’t stop, [3, 1] is < s, no need to visit.

To make this prove more generic:

  • Assume the window stopped at [i, j]
  • Then j is the first element from i to n that makes sum(i, j) >= s
  • In another way, any sub window [i, k] where k < i will make sum(i, k) < s, so these sub window can be skipped.

Algorithm

Based on the discussion above, we come to the following algorithm:

while right < n:
    sum += num[right ++]
    while (sum > s):
        minLen = Math.min(minLen, right - left)
        sum -= num[left++]

Notice how the important principle we discussed in the beginning are applied in this algorithm:

  • Window: Use left and right boundary to represent a window
  • Stop condition: right boundary stops at the end
  • Status update:update status of window every time the window moves
  • Move condition: expand the window when the status did not mets requirement
  • Move condition: Shrink the window while the status meets requirements

Implementation

The following is the implementation of the algorithm:

public int minSubArrayLen(int s, int[] nums) {
    if (nums == null || nums.length == 0) {
        return 0;
    }
    int windowSum = 0;
    int minLen = nums.length + 1;
    for (int left = 0, right = 0; right < nums.length; right ++) {
        windowSum += nums[right];
        while (windowSum >= s) {
            minLen = Math.min(minLen, right - left + 1);
            windowSum -= nums[left ++];
        }
    }
    return minLen == nums.length + 1 ? 0 : minLen;
}

Minimum Window Substring problem

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".

Note:
If there is no such window in S that covers all characters in T, return the empty string “”.

If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.

Observations

We have the following observation about the problem:

  • Target at finding the minimal window that contains all characters in T
  • Sliding window characters can be applied to it: expand the window will increase the occurrence of certain character.

What combinations are skipped?

Let’s also simply the given examble to ADBEC, the following are combinations visited:

  • [A]
  • [A, D]
  • [A, D, B]
  • [A, D, B, E]
  • [A, D, B, E, C]
  • [D, B, E, C]

The combinations that are skipped are:

  • [D, B]
  • [D, B, E]
  • [B]
  • [B]
  • [B, E]
  • [B, E, C]

[D, B, E] are skipped when we shrink from [A, D, B, E, C] to [D, B, E, C].

Is it correct to skip these combinations?

The correctness can be proved similar to the process in the above question:

Let’s assume at window [i, j] it meets the targeting requirement, then the implication is that it does not meet the requirement at any sub window [i, k] where k < j. When any combination of these sub windows can be safely skipped.

Algorithm

The key is to find a way to:

  • Record the status of the window

Use a char map to record the occurrence of each character in the window.

  • Indication of the window meets requirement

This indication must be able to grow or decrease incrementally. A special count works

  • The count increase only when stats[ch] < target[ch], decrease when stats[ch] < target[ch]

where stats[ch] is the occurrence of ch in the window and target[ch] is the target occurrence of ch.

Implementation

The algorithm is very similar to the question above, the implementation is as below:

public String minWindow(String s, String t) {
    char[] tCh = new char[128];
    char[] sCh = new char[128];

    for (char ch : t.toCharArray()) {
        tCh[ch] ++;
    }

    String ret = "";
    int count = 0;
    int minWindowSize = Integer.MAX_VALUE;

    for (int left = 0, right = 0; right < s.length();) {
        char chRight = s.charAt(right ++);
        if (sCh[chRight]++ <= tCh[chRight]) {
            count ++;
        }

        while (count == t.length()) {
            if (right - left < minWindowSize) {
                minWindowSize = right - left;
                ret = s.substring(left, right);
            }

            char chLeft = s.charAt(left++);
            if (sCh[chLeft]-- < tCh[chLeft]) {
                count --;
            }
        }
    }

    return minWindowSize == Integer.MAX_VALUE ? "" : ret;
}

Sliding window application principle

Sliding window characteristics:

  • Sliding window has a start and end
  • Sliding window has status that indicate the status of the window
  • Sliding window grows/shrink toward one direction

Sliding window application principles:

  • Sliding window skipped some combinations to improve efficiency, to use sliding window needs to prove these combinations can be skipped.

In terms of implementation, here is the key ideas:

  • Use nested loop to move window: outer loop to expand the window, inner loop to shrink the window
  • The outer loop stop condition is the right boundary meets array end
  • The inner loop stop condition is the status of the window meets requirement, as long as it still meets requirement, shrink the window as possible.

Questions and follow ups:

  • If we change the question to Maximum Size subarray, does the sliding window algorithm still work?

Maximum size subarray does not need to apply sliding window: find out the sum of the entire array, if it meets requirements, it is the max window, otherwise no sub window works.

Maximum Size Subarray Sum Equals K

Maximum Size subarray problem

Given an array nums and a target value k, find the maximum length of a subarray that sums to k. If there isn’t one, return 0 instead.

Note:
The sum of the entire nums array is guaranteed to fit within the 32-bit signed integer range.

Example 1:
Given nums = [1, -1, 5, -2, 3], k = 3,
return 4. (because the subarray [1, -1, 5, -2] sums to 3 and is the longest)

Example 2:
Given nums = [-2, -1, 2, 1], k = 1,
return 2. (because the subarray [-1, 2] sums to 1 and is the longest)

Algorithm

The brute force solution is straightforward, try all possible combinations:

for index i ~ (0, n):
    for index j ~ (i, n):
         if sum(i, j) == k
            minLen = min(minLen, j - i + 1);

This algorithm takes O(n ^ 2) time, which is too slow.

The second solution I thought of was a sliding window algorithm.

while (right < boundary):
    sum += nums[right++]
    while (sum == target):
        minLen = min(minLen, right - left)
        sum -= nums[left++]

Sliding window does not work for this problem because:

  • Skipped part of the window may also contribute to a maxSize subarray.
  • Move right does not make the sum bigger, move left does not make the sum smaller. because number can be positive or negative.

A better solution is based on the following idea:

  • sum(i, j) = sum(0, j) – sum(i)
  • sum(0, i), sum(0, j) can be computed incrementally

The following diagram better illustrate the idea:
minimal-size-subarray-page-1So we have the following algorithm:

for i ~ (0, n):
    sum[i] = sum[i - 1] + num[i]
    gap = sum[i] - target
    if gap in map:
        maxWindow = Math.max(maxWindow, i - map(gap))

    if sum[i] not in map:
        map.put(sum[i], i)

Implementation

The following code implemented the algorithm above:

public int maxSubArrayLen(int[] nums, int k) {
    if (nums == null || nums.length == 0) {
        return 0;
    }

    int sum = 0, maxLen = 0;
    Map<Integer, Integer> sumIndex = new HashMap<Integer, Integer>();
    for (int i = 0; i < nums.length; i++) {
        sum += nums[i];
        if (sum == k) {
            maxLen = i + 1;
        } else if (sumIndex.containsKey(sum - k)) {
            maxLen = Math.max(maxLen, i - sumIndex.get(sum - k));
        }
        if (!sumIndex.containsKey(sum)) {
            sumIndex.put(sum, i);
        } // not update index because we want max sum
    }
    return maxLen;
}
  • Runtime: O(n)

Maximum Product Subarray

Maximum Product Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest product.

For example, given the array [2,3,-2,4],
the contiguous subarray [2,3] has the largest product = 6.

Reduction Transition Function

Product subarray is different from sum subarray in that:

  • A negative number * a negative number produce a big positive number
  • A negative number * a positive number produce a small negative number
  • Thus to get the localMax[i + 1], we may need to know the localMin[i]
  • So we want to record the locaMax and localMin at each step.

Thus we have the following RTF:

  • if nums[i + 1] >= 0:
    • localMax[i + 1] = Math.max(localMax[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMin[i] * nums[i + 1], nums[i]);
  • if nums[i + 1] < 0:
    • localMax[i + 1] = Math.max(localMin[i] * nums[i + 1], nums[i]);
    • localMin[i + 1] = Math.min(localMax[i] * nums[i + 1], nums[i]);

The following diagram illustrates the algorithm:

Max product subarray - Page 1.png

Implementation

The following is the implementation of the algorithm:

public int maxProduct(int[] nums) {
    int localMax = nums[0];
    int localMin = nums[0];
    int max = nums[0];
    for (int i = 1; i < nums.length; i++) {
        if (nums[i] >= 0) {
            localMax = Math.max(nums[i], nums[i] * localMax);
            localMin = Math.min(nums[i], nums[i] * localMin);
        } else if (nums[i] < 0) {
            int localMaxTemp = Math.max(nums[i], nums[i] * localMin);
            localMin = Math.min(nums[i], nums[i] * localMax);
            localMax = localMaxTemp;
        }

        max = Math.max(localMax, max);
    }
    return max;
}
  • Runtime: O(n)

Maximum Subarray

Maximum Subarray Problem

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.

For example, given the array [-2,1,-3,4,-1,2,1,-5,4],
the contiguous subarray [4,-1,2,1] has the largest sum = 6.

Reduction Transition Function

Consider the question in the following way:

  • Assume max subarray ending at nums[i] is nums[k, i]
  • What about max subarray ending at nums[i + 1]

There are two options:

  • Add nums[i + 1] into the max subarray ending at nums[i], which is nums[k, i]
  • Begin anew, max subarray ending at nums[i + 1] is nums[i + 1]

We don’t need to consider adding part of nums[k, i] with nums[i + 1]. Why? Because:

  • If sum(nums[k, i]) > 0, we want to add all of it
  • If sum(nums[k, i]) < 0, we don’t want to add any of the element.

Thus we have the following RTF:

localMax[i + 1] = localMax[i] > 0 ? local_max[i] + nums[i + 1] : nums[i + 1]

The following diagram illustrates whether to combine nums[i + 1] with nums[k, i] or to start anew:

Blank Diagram - Page 1-8.png

Then how does these localMax enable us to find the global max subarray? The following diagram illustrates the reason:

blank-diagram-page-1-9

LocalMax either extend or begin anew. Thus the array is segmented by all possible localMax, and the max of them is global max, which can be proved easily.

Implementation

The following is the implementation:

public int maxSubArray(int[] nums) {
    int max = Integer.MIN_VALUE, subMax = 0;
    for (int num : nums) {
        subMax = subMax > 0 ? subMax + num: num;
        max = Math.max(max, subMax);
    }
    return max;
}

Largest Rectangle in Histogram

Largest Rectangle in Histogram Problem

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

Below is a histogram where width of each bar is 1, given height = [6, 5,8,6,2].largest-rectangle-in-histogram-page-1-copy

The largest rectangle is painted as blue, which has are = 20 unit.

Max rectangle in histogram

Very similar to what we’ve discussed on Dynamic Programming: Maximal Rectangle, the area of a rectangle is determined by three different dimensions:

  • Height
  • Left boundary/right boundary.

As shown on the image below:
largest-rectangle-in-histogram-page-1-3

  • Bar i can form a rectangle with the height of heights[i]
  • Area of this rectangle is determined by left boundary and right boundary
  • Max rectangle can be found from one of the rectangles

Left boundary and right boundary

The key question is:

How to find left/right boundary?

By definition:

  • left_boundary[i] = j, when height[j, i] >= height[i] and height[j – 1] < height[i]
  • right_boundary[i] = k, when height[i, k] >= height[i] and height[k + 1] < height[i]

Which means, left boundary is the left most index where the bars between it and the current bar are no lower than current bar, while the one left to the left boundary is shorter than current bar.

One simple way to find left/right boundary is to search from index i until you find the first bar lower than height[i], but this solution takes O(n ^ 2), which is too expensive.

To find left/right boundary efficiently, consider the following observation:

  • If heights[i] > heights[i – 1], then left(i) = i
  • If heights[i] <= heights[i – 1], then left(i) = left(i – 1)
  • If heights[i] > heights[i + 1], then right(i) = i
  • If heights[i] <= heights[i + 1], then right(i) = right(i + 1)

Combine the latter two with the first two, we got the following algorithm of computing the leftBoundary and rightBoundary:

  • Create empty stack
  • For each bar 0 ~ n – 1:
    • If stack.isEmpty() or heights[i] >= height[stack.peek]
      • stack.push(i)
    • If !stack.isEmpty() and heights[i] < heights[stack.peek]
      • left(stack.peek) = stack.peek, right(stack.peek) = i
    • repeat until stack empty or heights[i] >= heights[stack.peek]

The following example illustrated how the above algorithm works:

largest-rectangle-in-histogram-page-1

Implementation

The following code implements the algorithm above.

public int largestRectangleArea(int[] heights) {
    if (heights == null || heights.length == 0) {
        return 0;
    }

    int maxArea = 0;
    int len = heights.length;
    Stack stack = new Stack ();
    for (int i = 0; i <= heights.length; i++) {         while (!stack.isEmpty() && (i == len || heights[stack.peek()] > heights[i])) {
            int height = heights[stack.pop()];
            int width = stack.isEmpty() ? i : i - stack.peek() - 1;
            maxArea = Math.max(maxArea, height * width);
        }
        if (i != len) {
            stack.push(i);
        }
    }
    return maxArea;
}
  • run time: O(n)

A Wrong DP Solution

I once came to a wrong DP solution for this question:

  • for bar 1 ~ n:
    • left[i] = height[i] > height[i – 1] ? i : left[i – 1]
  • for bar n – 1: 0:
    • right[i] = height[i] > height[i + 1] ? i : right[i + 1]

The following diagram illustrated why it is incorrect:Largest Rectangle in Histogram - Page 1-3 copy.png

The thing I learned was: always be careful about DP solutions : )

Dynamic Programming: Maximal Rectangle

Maximal Rectangle problem

Given a 2D binary matrix filled with 0’s and 1’s, find the largest rectangle containing only 1’s and return its area.

For example, given the following matrix:


1 0 1 0 0
1 0 1 1 1
1 1 1 1 1
1 0 0 1 0

Return 6.

Height, left boundary and right boundary

A rectangle within a matrix can be decided by three dimensions:

  • Height of the rectangle: counts the number of successive ‘1’s above (plus the current one)
  • left/right boundary: the boundaries of the rectangle which contains the current point with a height of value height

If you look at the following square, you will find that we can find three different square:
maximal-rectangle-page-1

  • rectangle 1: height 2, left boundary index 1, right boundary index 2
  • rectangle 2: height 3, left boundary index 2, right boundary index 2
  • rectangle 3: height 1, left boundary index 1, right boundary index 3

Correctness

The question is:

How does this algorithm guarantees find maximal rectangle ending at bottom row.

If we only take a look at the example, it is based on the following truth:

  • Rectangle 1 is the largest rectangle with height of 2
  • Rectangle 2 is the largest rectangle with height of 3
  • rectangle 3 is the largest rectangle with height of 1
    Then the rectangle with the largest area must be in one of them.

Reduction Transition Function

The next question is:

what will happen if we add another row?

In the following diagram, one more row is added.
maximal-rectangle-page-1-2

Three new square if formed when this new row added, the height/left boundary/right boundary have to be recalculated.

Height RTF:

  • height[i][j] = matrix[i][j] == ‘1’ ? height[i – 1][j] + 1 : 0

Left boundary RTF:

  • currleft = matrix[i][j] == ‘1’ ? currleft : j + 1
  • left[i][j] = matrix[i][j] == ‘1’ ? max(left[i – 1][j], curr_left) : 0

The following diagram illustrate how this transition works:
maximal-rectangle-page-1-3

  • curr_left = 0
  • matrix[3][1] = ‘0’ => curr_left = j + 1 = 2
  • matrix[3][2] = ‘1’ => left[1] = max(left[3], curr_left) = 2

We need to update curr_left because when previous cell is ‘0’, previous column could not be counted, hast to restart.

Let’s change matrix[3][1] to ‘1’:
maximal-rectangle-page-1-4

  • curr_left = 0
  • matrix[3][1] = ‘1’, left[2] = max(0, left[2]) = 1
  • matrix[3][2] = ‘1’, left[3] = max(1, left[3]) = 1

When previous cell is ‘1’, previous column could sill be counted, no need to restart.

Right boundary RTF:

  • currright = matrix[i][j] == ‘1’ ? currright : j
  • right[i][j] = matrix[i][j] == ‘1’ ? min(right[i – 1][j], curr_right) : n;

This is the same as we construct left boundary.

Implementation

We can use a rolling array to replace the matrix. The following is the implementation of the matrix.

public int maximalRectangle(char[][] matrix) {
    if (matrix == null || matrix.length == 0 || matrix[0].length == 0) {
        return 0;
    }

    int nrow = matrix.length;
    int ncol = matrix[0].length;

    int[] height = new int[ncol];
    int[] left = new int[ncol];
    int[] right = new int[ncol];
    Arrays.fill(right, ncol);
    int maxArea = 0;

    for (int i = 0; i < nrow; i++) {
        int currLeft = 0, currRight = ncol;
        // calculate height
        for (int j = 0; j < ncol; j++) {
            height[j] = matrix[i][j] == '1' ? height[j] + 1 : 0;
        }

        // calculate left boundary: left to right
        for (int j = 0; j = 0; j--) {
            if (matrix[i][j] == '1') {
                right[j] = Math.min(right[j], currRight);
            } else {
                right[j] = ncol;
                currRight = j;
            }
        }

        // calculate max area: right bounday does not include valid column
        for (int j = 0; j < ncol; j++) {
            maxArea = Math.max((right[j] - left[j]) * height[j], maxArea);
        }
    }

    return maxArea;
}