MIT PoM 1: introduction to Microeconomics

  • Economics is all about scarcity and the constrained optimization, which means given certain constraints, how does individuals and firm trade off different alternatives to make the optimal choice. Actually, all engineering is about constraint optimization.
  • This course is going to be focusing on two types of actors, consumers and producers. And we are about to build models to explain the behavior of these actions. The models needs to be tractable and can explain the real world in reality.
  • Consumer are to optimize the utility, firms on the other hand, are going to optimize the profits.
  • Three are fundamental questions in microeconomics: what goods and services should be produced? How these goods and services are produced? Who should get these goods and services? Price will determine what get produced, how it’s produced, and who gets the goods that are produced. 
  • The first distinction: theoretical v.s. empirical economics. Theoretical economics is the process that build the models to explain the world, while empirical economics is the process of testing these models to see how good they could explain the world. 
  • Another distinction: positive v.s. normative economics. The way things are: positive economics, and things should be: normative economics. 
  • Supply + demand: water is important but has a large supply, diamond is not relevant to the life, but has a much lower supply, thus a much lower price.
  • Consumer theory: preference, constraints. 

Java Stream

What is stream?

Stream is an abstraction of data operations. It takes input from the collections, Arrays of I/O channels.

From imperative to declarative

For example, given a list of people, find out the first 10 people with age less or equals to 18.

The following solution is the imperative approach:

public void imperativeApproach() throws IOException {
        List<Person> people = MockData.getPeople();

        List<Person> peopleAbove18 = new ArrayList<>();
        for (Person person : people) {
            if (person.getAge() <= 18) {
                peopleAbove18.add(person);
            }
        }

        for (Person person: peopleAbove18) {
            System.out.print(person);
        }
}

The following is the declare approach style:

public void declareApproach() throws IOException {
        List<Person> people = MockData.getPeople();
       people.stream()
                    // a lambda function as a filter
                  .filter(person -> person.getAge() <= 18)
                  .limit(10)
                  .collect(Collectors.toList())
                  .forEach(System.out::print);
}

Abstraction

We mentioned that stream is an abstraction to the data manipulation. It abstract them in the following way:

  • Concrete: can be the Sets, Lists, Maps, etc
  • Stream: can be filter, map, etc.
  • Concrete: collect the data to make it concrete again.

Intermediate and Terminate Operation

Java stream has different operation units:

  • Intermediate operators: map, filter, sorted
  • Terminators: collect, foreach, reduce

Each intermediate operation is lazily executed and return a stream, until a terminal operation is met.

Range

With IntStream.range(), you can create a stream with fixed set of elements, for example:

    public void rangeIteratingLists() throws Exception {
        List<Person> people = MockData.getPeople();

        // Use int stream to loop through the list and print the object.
        IntStream.range(0, 10).forEach(i -> System.out.println(people.get(i)));

        // If you want to use for the first elements
        people.stream().limit(10).forEach(System.out::println);
    }

You can also iterate the function for the given number times:

    public void intStreamIterate() throws Exception {
        // This is very much like the fold function on Kotlin,
        // that it keep iterating based on the iterator you provided.
        IntStream.iterate(0, operand -> operand + 1).limit(10).forEach(System.out::println);
    }

Max, Min and Comparators

Java stream provides built in Min/Max function that support customized comparators. For example:

    public void min() throws Exception {
        final List<Integer> numbers = ImmutableList.of(1, 2, 3, 100, 23, 93, 99);

        int min = numbers.stream().min(Comparator.naturalOrder()).get();

        System.out.println(min);
    }

Distinct

Sometimes, we would like to get the distinct elements from the stream, then we could use the distinct api of the stream

  public void distinct() throws Exception {
    final List<Integer> numbers = ImmutableList.of(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 9);

    List<Integer> distinctNumbers = numbers.stream()
        .distinct()
        .collect(Collectors.toList());

  }

Filtering and Transformation

Stream filter api enables you to filter the content of the element, for example:

    public void understandingFilter() throws Exception {
        ImmutableList<Car> cars = MockData.getCars();

        // Predicate is an assertion that returns true or false
        final Predicate<Car> carPredicate = car -> car.getPrice() < 20000;

        List<Car> carsFiltered = cars.stream()
            .filter(carPredicate)
            .collect(Collectors.toList());

And map API enable you to transform the format of the element, for example, we could define a another object and transform the given stream to the targeted stream:

    public void ourFirstMapping() throws Exception {
        // transform from one data type to another
        List<Person> people = MockData.getPeople();

        people.stream().map(p -> {
            return new PersonDTO(p.getId(), p.getFirstName(), p.getAge());
        }).collect(Collectors.toList());

    }

Group Data

One common function in SQL queries are data grouping, for example:

SELECT COUNT(*), TYPE FROM JOB WHERE USER_ID = 123 GROUP BY TYPE

Java stream provides similar functionalities:

  public void groupingAndCounting() throws Exception {
    ArrayList<String> names = Lists
        .newArrayList(
            "John",
            "John",
            "Mariam",
            "Alex",
            "Mohammado",
            "Mohammado",
            "Vincent",
            "Alex",
            "Alex"
        );

    Map<String, Long> counting = names.stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    counting.forEach((name, count) -> System.out.println(name + " > " + count));
  }

Reduce and Flatmap

Very similar to the Hadoop Map/Reduce job, where map take care of transformation of the data, while the reduce job collect the data and do the final computation. For example:

  public void reduce() throws Exception {
    Integer[] integers = {1, 2, 3, 4, 99, 100, 121, 1302, 199};

     // Compute the same of the elements, with the initial element as a
    int sum = Arrays.stream(integers).reduce(0, (a, b) -> a + b);
    System.out.println(sum);

    // use the function reference
    int sum2 = Arrays.stream(integers).reduce(0, Integer::sum);
    System.out.println(sum2);

  }

Flat map is different from the map function that it could flat the internal structure first.

For example:

List<List<String>> list = Arrays.asList(
  Arrays.asList("a"),
  Arrays.asList("b"));
System.out.println(list);

System.out.println(list
  .stream()
  .flatMap(Collection::stream)
  .collect(Collectors.toList()));

The result of the stream is a String List.

消费函数

当我们考虑花多少钱在消费上的时候,我们会考虑哪些因素?也就是说,我们的消费由什么决定?在经济学中,又该如何量化这个决策的过程?凯恩斯提出的消费函数认为:消费仅仅取决于现期收入,也就是 C=C0 + aY`。这里的Y是收入,而C0 是当前消费,而a是边际消费率,也就是随着收入增长,消费增长的比例。凯恩斯的消费公式有几个重要含义:

  • 消费随着收入的增长而增长,但是增长的比例小于1,即边际消费率大于0但是小于 1。
  • 边际消费率本身也会随着收入的增长而减小。

这种消费函数的理论在关于短期内消费倾向随收入变化的研究中得到了验证。根据凯恩斯的理论,随着收入的提高,由于新增收入部分消费比例的而降低,消费占收入整体的比例是减小的。这样就带来了一种预测中的困境:最终可用于消费的资金越来越少。问题在于,这种预测中的困境并未变成现实,而对长期消费倾向的研究显示,长期消费倾向不随收入而变化,也就是说,凯恩斯的公式在长期不适用了。

许多经济学家对这个问题进行了研究。费雪的研究现实人们在决定消费的时候并不只会考虑当前的收入,人们会考虑未来的收入,并会通过储蓄和贷款等方式将长期的消费平滑。比如在对未来收入稳定或者增长的情况下通过贷款来提前消费,或者在对退休后收入下降的预期之下通过储蓄来增加后期的消费。也就是说,人们会通过理性的调整自己的消费和储蓄的比例来尽力使得长期消费平滑。

费雪模型中引入了利率的影响。在进行跨期选择的时候,由于利息的作用,未来收入的价值要小于当前收入的价值。为了对现期与未来的消费组合进行比较,费雪的模型还引入了无差异曲线,在这个曲线上的消费组合对于消费者所产生的满意度是一样的。收入的增加会提升无差异曲线从而带来现期消费和未来消费的同时增多。于此同时,消费者受到跨期预算的约束,现期的收入必须减去为未来的储蓄才能作为消费,未来的消费包括未来的储蓄和未来的收入。现期和未来的消费在引入利息之后等于现期和未来的收入。

在费雪理论的基础上,弗朗科提出了生命周期假说。在费雪的模型中,消费取决于一个人一生的收入,弗朗科进一步强调人在收入在人们的一生中系统地变动,而人们通过储蓄把收入从一生中的高收入时期转移到低收入时期。弗朗科的模型在凯恩斯的模型之上引入了个人的财富W。消费者的总资源包括其初始的财富和一生中的收入,然后平均分配到未来的若干年中就可以得到平均的消费函数:C=aW+bY,其中a为财富的边际消费倾向,Y为收入的边际消费倾向,而平均消费倾向就变成了:C/Y = a(W/Y) + b。因此当我们观察不同个人或短期数据的时候,因为财富短期不变,高收入带来较低的平均消费倾向。但在长期,由于财富的增加,消费函数会向上移动,从而阻止了平均消费倾向随着收入的增加而下降。

弗里德曼提出了另一种理论来说明长期消费函数。他假设,我们的现期收入可以分为两个部分:永久收入和暂时收入。永久收入是一生中的平均收入,暂时收入是在这个平均值附近的随机偏离。比如更高的教育水平可以带来更高的平均收入,运气等原因会带来不同的暂时收入,弗里德曼的结论是,消费函数可以近似的看成:C=aYp ,其中a为常熟,它衡量永久收入中用于消费的比例。永久收入的假说认为,弗里德曼消费函数使用了错误的变量,而平均消费倾向取决于永久收入与现期收入的比例。当现期收入短暂上升到永久收入一下的时候,平均消费倾向暂时下降,反之则会上升。

经济学中关于消费函数的演进过程使我想起物理学定律被不断被修正的过程:初始提出的模型或者假设被发现不能使用与新的领域,于是新的模型被提出,更有想要常识使用同一个模型解释多重不同的情况,比如宏观和微观,经典和量子。经济学建模的过程与此相似,都是在不断找出真正影响结果的因素,修正所使用的模型的过程。在经济学的学习中我也逐渐发现数学的重要性,为了可以进行量化讨论和研究,数学模型是必不可少的。s

Lambda Expression in Java/Kotlin

Higher-order function

In computer science, a higher-order function is a function that does at least one of the following:

  • Takes one or more functions as arguments
  • Returns a function as its result.

All other functions are first-order functions.

Anonymous class

In Java, the anonymous class enables you to declare and instantiate a class at the same time. If you only need to use a local class once, then you should use the anonymous class. For example, if you wish to define a runnable class to execute a task:

Executors.NewSingleThreadExecutor().execute(new Runnable() {
    @Override
    public void run() {
        // Your task execution.
    }
})

As you can see on the above example, the Runnable is a interface with one function run defined. The anonymous class implemented the interface.

Lambda Expression

Besides the anonymous class, Java also supports anonymous functions, named Lambda Expression.

While anonymous class enables you to declare the new class in a statement, it is sometimes not concise enough when there is only one function in the class.

For the example on the above section, we could simplify it’s implementation with a Lambda expression.

Executors.NewSingleThreadExecutor().execute(()-> {// Your task execution })

The lambda expression provides a few functionalities:

  • Enable to treat functionality as a method argument, or code as data
  • A function that can be created without belonging to any class.
  • A lambda expression can be passed around as if was an object and executed on demand.

In the mean time, the functions are first-class in Kotlin. They can be stored in variables and data structures, passed as arguments and returned from other higher-order functions.

The kotlin Lambda expression follows the following syntax:

  • It is always surrounded by curly braces,
  • Parameter declarations in full syntactic from go inside curly braces and have optional type annotations.
  • The body goes after an -> sign.
  • If the expression return type is not Unit, the last expression inside the body is treated as the return type.

As you can tell, the Lambda expression can’t specify the return types. If you wish the define the return type, you could use an alternative solution: anonymous function.

fun(x: Int, y: Int): Int = x + y

The major difference between Kotlin and Java is that Kotlin is a functional programming language. Kotlin has a dedicated type for functions, for example:

val initFunction: (Int) -> Int

The above expression means that the initFunctions is a function type, and the function takes in a integer and return a integer.

The above function be rewrite as:

val a = {i: Int -> i +1}

SRE: Data Integrity

 

Data integrity usually refers to the accuracy and consistency of data throughout its lifetime. For customer involved online services, things can go even more complex. Any data corruption, data loss, or extended unavailability are considered data integrity issue for the customer.

Data integrity in many cases can be a big problem. For instance, the database table was corrupted and we had to spend a few hours restore the data from the snapshot database. In another instance, the data was accidentally deleted and had a fatal impact on our client, as the client never expected the data to be unavailable. However, it was too expensive to restore the data, so we had to fix the dependent data record and some code on the client side to mitigate the impact on the clients. There is another instance that the data loaded for the client is not what they expected. This is clearly a data consistency issue. However, the issue was not reproducible and thus made it super hard for the team to debug.

There are many types of failure that could lead to the data integrity issue:

  • Root cause

User actions, Operator Error, Application Bug, Infrastructure defect, Hardware Failure, Site Disaster

  • Scope

Wide, Narrow, directed

  • Rate

Big Bang, Slow and Steady

This leads to the 24 combinations of the data integrity issue. How do we handle such issues?

First layer action is to adopt soft delete to the client data. The idea behind soft delete is to make sure that the data is recoverable if needed, for example, from operation errors. A soft delete is usually implemented through adding a is_delete flag and a deleted_at time stamp to the table. When data is to be deleted, they are not deconstructed from the database immediately, but will be marked as deleted with a scheduled deleted time in the future, say 60 days from the deletion. In this way, the data deletion could be reverted if necessary.

There are different opinions about the soft deletion solution, as it might introduce extra complexity on the data management. For example, when there are hierarchies and dependency relationship between the data records, the deletion might break the data constraints. In the meantime, it makes the data selection and option more complex, as a customized filter has to be applied to the data in order to filter out the data that has been soft deleted. And recovering the soft delete data can also be complex especially only part of the data is deleted, a recovery might involve complex data merge.

The second layer action is to build the data backup system and make the recovery process fast. We need to be more careful here that the data backup or archive is not the purpose of data integrity. Find out ways to prevent the data loss, to detect data corruption, to quickly recover from data integrity instance is more important. Data backup is often times neglected as it yields no visible benefit and not a high priority for anyone. But building a restoring system is a much more useful goal.

For many cloud services, data backup is an option, for example, AWS RDS supports creating data snapshot, while the cloud cache Redis cluster supports backup the data on the EBS storage. Many people stop here as they assume that the data is currently back up. However, we should realize that the data recovery could take a long time to finish, and the data integrity is broken during the recovery time. The recovery time should be an important metric for the system.

Besides back up, many systems use replicas. And by failover to the replica when the primary node had an issue, they could improve the system availability. We need to realize that the data might not be consistent between the primary instance and the replica instance.

A third layer is to detect the error earlier. For example, have a data validation job that validates the integrity of the data between different storage systems so that the issue could be fixed quickly when it happens.

SRE: Service Level Objectives

As we are moving from monolithic service to the micro-services, I find it pretty useful to think about the following problems:

  • How do we correctly measure the service?

This question can be breakdown into the following sub questions:

  • If you are about to maintain your service, how do you tell whether your service is functioning correctly or not?
  • If your service is a client facing product, how do you know whether you provide a good experience for the client or not?
  • How do you know your service hit the performance bottleneck or not?
  • If you are about to scale the service performance, what metric should you use to find out the performance bottleneck?

As you find out from the above sub-questions​, it all about defining the metrics for the proper client and proper scenario. For sanity check of the service, it should be quick and straightforward, for product experience check, you should rather put metric where the product impact could be measured; for performance monitoring and optimization, finding out how to measure the resource utilization and the dependent service are essential metrics.

I found that many times we didn’t measure our service correctly. And it takes time, experience and domain knowledge to find define such metrics correctly.

  • How to manage the expectation of the clients that are using our service?

One common problem for monolithic service is that the integration many times are based on direct DB or data integration, where the client treat it as local DB where the data is always available. With this setting, the failure in one domain would be contagious: as the client never assume there will be failures from the domain and when failure happens, no exception handling logic is installed to handle such failures.

To make the system more resilient, a service level agreement or expectation is truly needed. It is about setting the expectation of you service client: we are not a service that is constantly performant and reliable, we could slowdown or not available at some case, and you should be prepared for that.

So I find it is pretty useful to think about these problem​s with the SLOs and related concepts:

  • SLIs: Service Level Indicators

The service level indicators are metrics you defined to measure the behavior and performance of your service. It can be product facing or purely engineering facing.

  • SLOs: Service Level Objectives

The service level objectives are objectives that are set based on the SLIs. It serve as the direction for the team to optimize the system if necessary.

  • SLAs: Service Level Agreement

The service level agreements are more about the contract you defined for the client: how fast your service could load the data on average, and at what case you service might fail and how your client should handle such failure.

Besides defining the SLIs and SLAs, it also provides a way to validate the adoption of the SLAs. For example, if your client are supposed to handle the data access failures from your service, then you can validate that by scheduled an outage of your service. And by doing so, you push your client to adopt to the SLAs.​

货币供给的调控机制

中央银行是如何调控市场上的货币供给的呢?

中央银行的调控目标

在了解中央银行如何调控供给之前,我们先要了解为什么中央银行要进行这样的调控。以及,这样的调控是需要的吗?为什么不能完全让市场决定?中央银行是唯一合法的供给渠道,但是我们知道,如果市场上的货币供给过多就会导致通货膨胀,比如中央银行向财政部购买超出预算的债券,而如果货币供给不足就会导致货币市场上的利率增高,抑制投资和消费的需求,妨碍经济发展。但是中央银行以哪种测量量作为调控的指标和目标呢?不同的银行有不同的选择,总体上是以货币增长量和利率为主。

债券公开市场操作

中央银行调控货币流量的一个主要方式是公开债券市场回购债券。回购债券表现在家庭资产负债表上的变化是资产从债券转变成银行存款,因此市场上的货币量得以增加。在了解回购操作之前我们首先要了解一下债券这种金融产品。债券是在一段时间里保证每一时期支付一定金额的金融票据。和银行存款不同,债券是可以购买和转让的,并且购买和转让的价格取决于下面这些因素:

  • 面值:面值(face value,par value)是指债券交割的时候偿还的数量
  • 交割期限:债券到期和交割的日期,比如一年,五年,三十年等。
  • 利息和付息频率:按照利息(coupon)分比为零息债券,附息债券等,按照付息频率有比如半年付息,每年付息和一次付息等。
  • 发债者的信用水平
  • 可比较替代投资的收益水平:主要是银行存款的利息。

为了简化比较,我们现在假设债券的价格只受到其可比较替代投资品,也就是银行定期存款,的收益水平的影响,也就是说债券的价格等于其将来要产生的总收益的现值之和。未来资金的现值由当前的利率决定,比如银行利率为10%,则未来的100美元现值为100/(1+10%)=90.9,因为现在将90.9美元的资金存入银行,一年后就可以获得100美元的收益。

下面是2019年3月22日的美国国库券信息:

  • 交割期限:30年
  • 利息:3%,即每年偿付3%
  • 价格:102.52美元
  • 面值:100美元

在30年时间里一共要偿付90美元利息,最后偿还100美元本金,假设30年间年利率是3%,那么这三十年的实际总收益是多少呢?
Y = 3 + 3/(1 + 3%) + 3/(1+3%)2 + … + 3/(1+3%)29 + 100/(1+3%)29。
用下面的python 代码可以算出30年的总收益为:102.9,也就是说,这个国库券的当前价值为102.9。

par_value = 100
coupon = 3
ytm = 0
interest_rate = 0.03
for x in range(30):
    print("year {}".format(x))
    ytm += coupon/((1+interest_rate)**x)
    print("accumulated yield to this year: {}".format(ytm))
ytm += 100/((1+interest_rate)**29)
print("year to maturity {}".format(ytm))

如前面所说的,国库券可以在二级市场(secondary market)上进行交易,而中央银行可以通过在市场上的回购控制货币量。

贴现窗口

中央银行还可以通过向商业银行贷款来控制货币量。在了解贴现窗口之前我们先了解一下联邦基金市场的相关概念:

  • 联邦基金市场(federal funds market)

法律规定的商业银行必须将一部分的存款留存以降低风险,留存的这部分存款占总存款数量的比例就成为准备金率(reverse ratio)。商业银行的实际准备金可能会低于或者多出的法定的准备金,如果多出一般会将这些钱借出,如果少于法定准备金率,商业银行既可以向其他银行拆借。这些用于满足法定准备金要求的资金市场就称为联邦基金,而这个用于借出和借入的市场就称为联邦基金市场,也称为同业拆借市场。一家银行拆出和拆入的利率差就是这家银行的收益。

  • 联邦基金利率

中央银行一般会维持一定目标的同业拆借利率,也就是联邦基金利率。我们常说的美联储加息就是增加目标联邦基金利率。中央银行一般通过控制对银行持有的证券的购买来控制联邦。通过向银行回购证券,银行的资金储备会增多,因此对联邦基金的需求就会降低,联邦基金利率就会下降,反之则上升。联邦基金利率的增加会增加银行拆借成本,从而抑制银行向市场发放贷款,从而也可以降低市场上的货币量。

  • 贴现窗口和贴现率

除了同业拆借,中央银行还可以直接向商业银行借款。这个机制就成为贴现窗口(discount window),而商业银行从中央银行获得借款的利息就成为贴现率(discount rate)。银行在选择使用哪种资金来源来满足法定储备要求的时候会对各个资金来源进行比较。如果贴现窗口的利率小于商业银行的贷款利率则使用贴现窗口就是有利可图的。但是同时中间还存在交易费用和坏账的风险。

还需要提到的事,中央银行可以通过多重手段来控制市场上的货币量。比如通过贴现窗口发放了5亿美元的贷款到商业银行之后,中央银行可能通过在债券市场上放出5亿美元的债券来抵消对货币总量的影响。

另一个重要的控制手段是通过外汇的买卖来控制货币量。购买更多的外国货币会降低本国货币的存量,而出售外国货币或证券会获得更多的本国货币。这种部分操作与公开市场的操作类似。