Building Microservices Note: Integration

What is Microservices Integration ?

How one microservice can talk with another? In general, the following technologies could be used: shared database, RPC, and REST. But how do we pick up the right one?

The following are general answers to the questions:

  • Avoid breaking changes.

When we add a change on the server side, the client side should not be impacted unless necessary.

  • Keep your APIs technology agnostic

This is to avoid integration technology that dictates what technology stacks we can use to implement the micro services.

  • Make your service simple for consumers

Make it easy for the consumers to user the services.

  • Hide internal implementation details

Don’t have the consumers bound to the internal implementation.

With this guideline, let’s take a look at available options.

Integration technology

Shared Database

The most common form of integration is database(DB) integration: if other services want to retrieve or update the information from another service, they reach into the database.

This integration pattern definitely breaks many of the guidelines we set put:

  • It allows the external parties to view and bind to internal implementation details.
  • The customers are tied to a specific technology choice. For instance, if we decided to use Postgres, the consumer has to find a way to integrate with Postgres.
  • It would be very difficult to change the logic, the logic of parsing the data must be spreader across all consumers.

So in sum, shared database is never a good choice for micro service integration.

Remote Procedure Call

Remote Procedure Call(RPC) refers to the technique of making a local call and having it execute on a remote service somewhere.

The common used PRC techniques includes SOAP, Thrift, Protocol Buffers. They support separate interface definition, and client/server stub generation based on the definition.

Some of the technique bind to a specific networking protocol, while some allows you to use different type of protocols. For example, gRPC utilizes Protocol Buffer and HTTP2.

Some RPC mechanism are heavily tied to a specific platform, but some other mechanisms for example Thrift and protocol buffer support multiple languages.

So, what’s the major downside of RPC calls? The problem is, Local Calls are not like Remote Calls. The consumers often forget about one thing: the network is not reliable. The client needs to keep in mind that the remote server could be down, be slow or the request could be lost and handle these situations properly.

REST

Representational State Transfer(REST) is another technique. The most important concept of REST is resources. The REST service creates different representations of this resource on request.

REST and HTTP

REST as a design pattern does not talk about underlying protocols, although it is most commonly used over HTTP. The major reason is HTTP itself defines some useful capabilities that works very well with REST style, for example, the HTTP verbs: GET, POST, and PUT, and HTTP error codes.

There are, of course, downside of REST over HTTP. For server to service communication, if extremely low latency or small message size is important, HTTP might not be a good choice, as the message header can be big. TCP/UDP might be a better choice in these case.

Data Interchange

REST support many formats of data interchange framework, the most comply used one are JSON and XML. Application data are converted into JSON/XML format, and then transmitted to the other service.

JSON V.S. XML

Both JSON and XML are human readable and machine readable.

  • XML supprots store the metadata information with the tag.

For example, the type of the field, while JSON has to encode these metadata as an object and associate it with the data.

  • XML supports link while JSON does not.

Xlink is used to create hyperlink in XML documents, with Xlink, the link content can be defined outside the linked files. While JSON is purely data based, and does not support anything similar. For python, the objects has to be concretely rendered in multiple places even though they share the same content.

  • JSON is significantly less verbose than XML.
  • JSON supports array.

This make it easy to be mapped to the data structure in the programming languages, for example, array, list, hash map etc. . For example, JSON is directly interchangeable with python dict. JSON objects can be easily parsed and manipulated in almost all programming languages.

Python Notes: Global Interpreter Lock

Why Python use GIL

Python uses reference count for memory management. Each objects in python has a reference count variables that keep track of the number of reference to the object. When the reference count goes back to 0, Python would release this object from memory.

However, in multi threading scenario, multiple thread might access the same object, and object reference count could be changed incorrectly in race conditions. Then objects that should be released could still stay in memory and worst case, objects that should not be release are incorrectly released.

To solve this problem, python introduced GIL, which is a global lock in python interpreter level. The rules is that, any python code has to acquire this lock to be executed. You might ask why not add one lock to each objects? This could result in deadlock.

In this way, python code guarantees that only one thread would be able to change the object reference count.

Problem of GIL

The GIL solution, however has problems in that Python code would not be able to utilize multi CPUs. If your code is CPU intensive, python multi-thread would not help you at all. However, if you program is not CPU intensive, but I/O intensive, for example, network application, Python thread is still a good choice.

Solution to this problem?

Are there solutions to the problem? Yes, there are. Python community has tries many times to solve this problem. Python GIL is not really tie to python language it self, it ties to Python interpreter it self. So, as long as we change the underlying python interpreter, python could support multithread. For example, Jython, is implemented in Java.

However, another important reason why Python GIL is not removed is that python has many extended libraries that are writing in C. Those libraries works well with Python in that they don’t need to worry about multi-thread models, the GIL model is really easy to integrate. Moving those libraries to other interpreters are hard works.

Another solution is to use multiprocess instead of multithread. Python has good designed libraries that supports multiprocess. However, process management would have more overhead than thread management for operating system, which means the performance of multiprocess programs are worse than multithreads.

Python Notes: Decorator

By definition, a decorator is a function that takes another function and extends the behavior of the latter function without explicitly modifying it. Built-in decorators such as @staticmethod, @classmethod, and @property work in the same way.

How decorator works? Let's take a look at the following example:

def my_decorator(some_func):
    def wrapper():
        some_func()
        print("Some function is being called.")

    return wrapper

def another_func():
    print("Another function is being called.")

foo = my_decorator(another_func)

foo()

You would see the following print on the screen:

"Another function is being called."
"Some function is being called."

As you can see, we pass some_func into a closure, and do something before or after calling this function without modifying its original behavior, and we return the function. As we already learned, python functions are just like other python objects, they are first class objects. The returned function could be called just as any other functions.

@decorator

The above example is already very similar to decorator. The difference is that decorator often comes with a @ symbol. This is an example of python syntax sugar, which often refers to syntax in a programming language that aims to make the things easy to read or to express. For example, the following is an example of a decorator:

@my_decorator
def another_func():
    print("Another function is being called.")

Decorator that takes any argument

In python, we can use *args and **kargs to represent arbitrary arguments. The following example shows how to take arbitrary arguments in a decorator:

def proxy(func):
    def wrapper(*args, **kargs):
        return func(*args, **kargs)
    return wrapper

Decorator with parameters

Sometimes we want the decorator to take parameters, for example, we should implement it in this way:

def decorator(argument):
    def real_decorator(function):
        def wrapper(*args, **kwargs):
            do_something_with_argument(argument)
            result = function(*args, **kwargs)
        return wrapper
    return real_decorator

Decorator tips

One practical tips when defining decorator is to use the functoolss.wraps, this function would keep all the meta data information of the original functions, including the function signature and docstring information.

import functools

def uppercase(func):
    @functools.wraps(func)
    def warpper():
        return func()
    return wrapper

Python Notes: function as first class object

Per history of python blog, everything in python are first class objects, that means all objects that could be named in the language (e.g., integers, strings, functions, classes, modules, methods, etc.) to have equal status. Th:at is, they can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth..

Essentially, functions return a value based on the given arguments. In Python, functions are first-class objects as well. This means that functions can be passed around, and used as arguments, just like any other value (e.g, string, int, float).

Internally, python use a common C data structure that are used everywhere in the interpreter to represent all objects, either it is a python function or a integer.

However, when it comes to python function as first class, there are subtle things to think about when doing design.

Think about the following function definition:

class A:
    def __init__(self, x):
        self.x = x

    def foo(self, bar):
        print self.x, bar

What would happen if you assign A.foo to a variable: b = A.foo. The first argument of the function would have to be the instance itself. To handle this problem, python 2 returns a unbound method, which is a warper around the original function, but it restrict that the first argument of the function has to be the object instance: a = A(), b(a). In python 3, however, this restriction is removed as the author found this is not very useful.

Let’s think about the second condition, when you have a instance of a class: a = A(1), b = a.foo. In this case, python would return a bound method which is a thin wrapper around the original function. Bound method stores the instance as a internal object and this object would be the default first argument when calling this function.

Python Notes: Iterator, Generator and Co-routine

Iteration

Python support iteration, for example, iterating over a list:

for elem in [1, 2, 3]:
    print elem

Iterating over a dict:

for key in {'Google': 'G',
              'Yahoo': 'Y',
              'Microsoft': 'M'}:
    print key

Iterating over a file:

with open('path/to/file.csv', 'r') as file:
    for line in file:
        print line

We use iterable objects in many ways, for example, reductions: sum(s), min(s), constructors: list(s), in operators: item in s.

The reason why we can iterate over iterable is because of iterable protocols: any objects that supports iter() and next() is an itterable. For example, we can define one itterable object in the following way:

class count:

def __init__(self, start):
    self.count = start

def __iter__(self):
    return self

Def next(self):
    if self.count < 0:
        raise StopIteration
    r = self.count
    self.count -=1
    return r

We can use the above example in this way:

c = count(5)
for i in c:
    print I
# 5, 4, 3, 2, 1

Generator

So what is a generator? By definition: a generator is a function that produces a sequence of results instead of a single value.

So generator is a function, it is different from other functions that it generates a sequence of results instead of a single value. Generator function is very different from normal function, calling the generator function will create one generator, but would not execute it, until next() is called. The following is an example of generator:

def count(n):
    while n > 0:
        yield n
        n -= 1

c = count(5)

Note that when we first initiate count, it won't execute. Until the first time we call, c.next() the generator would start to execute. But it will suspend on the yield command, until next time it executes.

So to speak, a generator is a convenient way of writing an iterator, and you don't have to worry about iterator protocols.

Except for yield based generator function, python also supports generator expression:

a = [1, 2, 3, 4]
b = [x*2 for x in a]
c = (x*2 for x in a)

b is still a regular list, while c is a generator.

Co-routine

Python coroutine is very similar to generator. Think about the following pattern:

def receive_count():
    try:
        while True:
            n = (yield) # Yield expression
            print "T-minues ", n
    except GeneratorExit:
        print "Exit from generator."

The above form of generator is called coroutine. Coroutine is different from generator in that it receives data instead of generates data. Think of it as a consumer or receiver.

To use python co-routine, you need to call next() first so that the function executes to the yield field part, then you can use send to send the value to the function. For example:

    c = receive_count()
    c.next() # trigger to yield function
    c.send(1) # sending 1 to the co-routine.
    # prints "T-minus 1"

Python provided a decorator called @consumer to execute the next() function part. With the consumer decorator, the co-routine can be used directly.

Then the question is: why don't we just declare co-routine as a regular function where you can send the value to it directly instead of relying on the yield expression? Using coroutine in the given examples doesn't fully justify it's value. More often, people use co-routine to implement a application level multiple threading. I will introduce more about this later.

Python Notes: Context management

Python supports context management. Which often used when handling resources, for example, file, network connection. With statement helps make sure the resources are cleaned up or released. There are two major functions for context management: __enter__ and __exit__.

__enter__ is triggered when the with statement is first triggered, while __exit__ statement is triggered when the statement finishes execution.

One very common usage of with statement is when we open up files:

    with open('/path/to/file', 'r') as file:
        for line in file():
            print(line)

The with statement on this example will automatically close the file descriptor no matter how this with block exits.

Python with statement also supports nesting, for example:

    with open('/open/to/file', 'r') as infile:
        with open('write/to/file', 'w') as outfile:
            for line in infile:
                if line.startswith('Test'):
                    outfile.write(line)

If you want your code to support with statement based context management, just override the two functions on your code, for example:

class Resource:
    def __init__(self, res):
        self.res = res

    def __enter__(self):
        #define your code here

    def __exit__(self, exc_type, exc_value, traceback):
        #define your code here

There is another way to support context management, which is to use the contextlib.contextmanager, we will introduce this later.

 

Python Notes: Closure

 

The following is an example of python closure.

def outer_function(outter_arg):
    closure_var = outter_arg
    def inner_function(inner_arg):
        print("{} {}".format(closure_var, inner_arg))

    return inner_function

# Usage of a python closure
closure_func = outer_function("X")
closure_func("Y1") # print "Y1 X"
closure_funct("Y2") # output "Y2 X"

Variables and nested function

Python has two types of variables: local variable and global variable. Variables defined within a function has a local scope, while variables defined outside a function has global scope.

When a function is defined within another function, the function is called nested function. The nested function, however, can access the outer function’s variables. In the example above, outer function defined a variable called closure_var, and this variable would be initiated by the input variable. The inner function is able to access this variable and use this variable in its own function definition.

There are two steps of using a closure function: initiate the closure function and assign to a local variable, then call the variable with parameter to invoke the inner function.

When to use closure?

  • Reduce the use of global variables

Closure could hide variables inside the function, in this way we reduce the use of global variables.

  • Simplify single function classes.
    For example, the above example could be converted into a single function class:
class OuterClass:
    def __init__(outer_arg):
        self.arg = outer_arg

    def inner_function(self, inner_arg):
        print("{} {}".format(self.arg, inner_arg))

In general, closure runs faster than instance function calls.