At Slang Labs, we are building a platform for programmers] to easily and quickly add multilingual, multimodal Voice Augmented eXperiences (VAX) to their mobile and web apps. Think of an assistant like Alexa or Siri, but running inside your app and tailored for your app.
The platform consists of:
- Console to configure a buddy for an app,
- Microservices that SDKs invoke to infer the intent inherent in the voice utterance of an end-user, and extract associated entities, and
- Analytics to analyze end-user behavior and improve the experience.
This tutorial is to share the best practices and lessons we have learned while building the microservices:
At the idea phase of a startup, one has some sense of destination and direction but does not know exactly what to build. That clarity emerges only through iterations and experimentations. We were no different, so we had to pick a programming language and microservice framework suitable for rapid prototyping. These were our key considerations:
- Rapid Development: high velocity of experimentations for quick implementation and evaluation of ideas.
- Performance: lightweight yet mature microservice framework, efficient for mostly IO-bound application, scales to high throughput for concurrent requests.
- Tools Infrastructure: for automated testing, cloud deployment, monitoring.
- Machine Learning (ML): easy availability of libraries and frameworks.
- Hiring: access to talent and expertise.
There is no perfect choice for the programming language that ticks all of the above. It finally boils down to Python vs. Java/Scala because these are the only feasible languages for machine learning work. While Java has better performance and tooling, Python is apt for rapid prototyping. At that stage, we favored rapid development and machine learning over other considerations, therefore picked Python.
With Python, came its infamous Global Interpreter Lock. In brief, a thread can execute only if it has acquired the Python interpreter lock. Since it is a global lock, only one thread of the program can acquire it and therefore run at a time, even if the hardware has multiple CPUs. It effectively renders Python programs limited to single-threaded performance.
While GIL is a serious limitation for CPU-bound concurrent Python apps, for IO-bound apps, cooperative multitasking of AsyncIO offers good performance (more about it later). For performance, we desired a web framework that is lightweight yet mature and has AsyncIO APIs.
Django follows the “batteries included” approach, it has everything you will need and more. While that eliminates integration compatibility blues, it also makes it bulky. It does not have AsyncIO APIs.
Flask, on the other hand, is super lightweight and has a simple way of defining service endpoints through annotation. It does not have AsyncIO APIs.
Tornado is somewhere between Django and Flask, it is neither as barebone as Flask nor as heavy as Django. It has quite a number of configurations, hooks, and a nice testing framework. It had been having an event-loop for scheduling cooperative tasks for much before AsyncIO, and had started supporting AsyncIO event loop and syntax.
Tornado was just right for our needs. But most of our design tactics are independent of that choice, and are applicable regardless of the chosen web framework.
Update: Since then, FastAPI has emerged as one of the fastest async python microservices frameworks. Flask 2.0 as well as Django 3.0 provide async/await APIs now. If I were to make a framework choice today, I will pick FastAPI instead of Tornado. However, all concepts and design ideas in this article are still applicable.
Overcoming Global Interpreter Lock
Before we plunge into design and code, let’s understand some key concepts: cooperative multitasking, non-blocking calls, and AsyncIO.
Preemptive vs Cooperative Multitasking
Threads follow the model of preemptive multitasking. Each thread executes one task. OS schedule a thread on a CPU, and after a fixed interval (or when the thread gets blocked typically due to an IO operation, whichever happens first), OS interrupts the thread and schedules another waiting thread on the CPU. In this model of concurrency, multiple threads can execute parallelly on multiple CPUs, as well as interleaved on a single CPU.
In cooperative multitasking, there is a queue of tasks. When a task is scheduled for execution, it executes till a point of its choice (typically an IO wait) and yields control back to the event loop scheduler, which puts it in the waiting queue, and schedules another task. At any time, only one task is executing, but it gives an appearance of concurrency.
Synchronous vs Asynchronous Calls
In synchronous or blocking function calls, the control returns back to the caller only after completion. Consider the following pseudocode:
bytes = read()
# "done" is printed only **after** bytes.
In asynchronous or non-blocking function calls, the control returns immediately to the caller. The called function can pause while executing. It takes a callback routine as an argument, and when the called function finishes and results are ready, it invokes the callback with results. Meanwhile, the caller function resumes execution even before the completion of the called function. Assume there is a non-blocking async_read function, which takes a callback function, and calls it with the read bytes. Consider the following pseudocode:
# "done" may be printed **before** bytes.
As you can see asynchronous code with callbacks is hard to understand because the execution order of the code can be different from the lexical order.
AsyncIO syntax of async and await facilitates writing asynchronous code in synchronous style instead of using callbacks, making code easy to understand.
When a function is async, it is called coroutine. It must be awaited, as its results will be available only in the future. An await expression yields the control to the scheduler. Code after the await expression is like a callback, the control to be resumed here later when the coroutine completes and results are ready.
AsyncIO has an IO Event Loop, a queue that holds all completed coroutines ready to be resumed.
Derisking by Design
While Tornado has worked out well for us so far, we did not know it then. We designed our microservices such that the Tornado-dependent code was segregated and localized. It was to easily migrate to a different framework if the need arises. Regardless, it is a good idea to structure your microservice into two layers: The Web Framework Layer and the framework independent Service Layer.
Web Framework Layer
Web Framework Layer is responsible for REST service endpoints over HTTP protocols. It does not have any business logic. It processes incoming requests, extracts relevant information from the payload, and calls a function in the Service Layer which performs business logic. It packages the returned results appropriately and sends the response. For Tornado, it consists of two files:
server.py contains an HTTP server that starts the event loop and application.
app.py contains endpoint routes that map REST API to a function in the service layer (specifically to a function in service.py, see next).
The Service Layer contains only business logic and knows nothing about HTTP or REST. That allows any communication protocol to be stitched on top of it without touching business logic. There is only one requirement for this layer:
- service.py must contain all functions needed to implement the service endpoints. Think of it as logical service APIs, independent of any Web framework or communication protocol.
Logical service APIs allow the Web Framework Layer to be implemented (and replaced) without getting into the nitty-gritty of the inner working of the service. It also facilitates the standardization and sharing of a large portion of web framework code across services.
We are rare among startups to automate testing and code coverage from the very beginning. It may appear counter-intuitive but we did it to maintain high velocity, and fearlessly change any part of the system. Tests offered us a safety net needed while developing in a dynamically-typed interpreted language. It was also partly due to paranoia regarding our non-obvious choice of Tornado, to safeguard us in case we need to change it.
There are three types of tests:
Unit Tests: Limited to independently test a class or function, mostly for leaf-level independent classes/functions.
Integration Tests: To test the working of multiple classes and functions. Out-of-process or network API calls (such as databases and other services) are mocked.
End-to-End Tests: To test deployment on test or stage environment. Nothing is mocked, just that data is not from the prod environment and may be synthetic.
We wrote integration tests both for the Service Layer to test business logic, as well as for the Web Framework Layer to test the functioning of REST endpoints in the Tornado server.
Get Source Code
Clone the GitHub repo and inspect the content:
The directory addrservice is for the source code of the service, and the directory test is for keeping the tests.
Setup Virtual Environment
Using a virtual environment is one of the best practices, especially when you work on multiple projects. Create one for this project, and install the dependencies from requirements.txt:
The script run.py is a handy utility script to run static type checker, linter, unit tests, and code coverage. In this series, you will see that using these tools from the very beginning is actually most economical, and does not add perceived overhead.
Let’s try running these. In each of the following, you can use either of the commands.
Static Type Checker: mypy package
Linter: flake8 package
Unit Tests: Python unittest framework
This will run all tests in the directory tests. You can run unit or integration test suites (in tests/unit and tests/integration directories respectively) as following:
Code Coverage: coverage package
After running tests with code coverage, you can get the report:
You can also generate an HTML report:
If you are able to run all these commands, your project setup is complete.
For quick prototyping, Python is more suitable. But it comes with the drawback of Global Interpreter Lock. Cooperative multitasking with non-blocking asynchronous calls using asyncio comes to the rescue. Tornado is the only mature Python web framework with asyncio APIs.
Layered design can derisk in case the framework is to be changed in the future. Tests can also be layered: unit, integration, and end-to-end. It is easy to set up lint, test, code coverage from the very beginning of the project.
Tutorial – Python Microservices with Tornado: