Hidden Anatomy of Backend Applications

We’re using a lot of different languages, platforms and frameworks to write backend applications. They dictate/promote/enable very different styles, approaches, architectures of the backend application. We are rarely looking under the hood of these tools, especially down to the level where happens actual interaction between backend application and external environment, most frequently OS API layer. But if we do, we may discover that all that languages/platforms/frameworks are just an isolation layer which hides the true internal structure of the backend application.

Let's take a look under the hood. For this purpose we’ll try to analyze how a very simple HTTP request is processed at the low level.

So, let’s imagine that we have an app with very simple entry point which just responds with fixed string on every GET request.

The environment (for example, OS) interacts with the hardware (probably via some virtualization layer, if the application runs in container) and receives one or more TCP/IP packets. Once these packets are received, they are delivered to the application. Depending on how the I/O processing is implemented in the application, delivering of the data can be performed slightly differently, but at the end, the application obtains a buffer with data received from the network. Now these data need to be processed. In case of HTTP, input data usually are added to some internal memory buffer until the full HTTP request (or, at least HTTP header) is obtained. Once the request is received, it is passed to further processing. In our case this will be parsing of the request. If the request is a GET request, then the application code is called. The application code responds with the processing result — the fixed string. This result then formatted as an HTTP response using some other internal buffer and then this buffer is passed to the environment for the transmission via network.

Note that at this level of abstraction the processing described above can be represented with the following simple scheme I/O ⇒ transformation ⇒ I/O. The scheme is the same regardless from the used tools and environment.

Let’s make the endpoint a bit more complex. Instead of responding with the fixed string, add a call to the external service which returns some string (for example, random greeting). The processing scheme will look like this:

I/O (incoming request from client) ⇒ preparation of the request to external service ⇒ I/O (request to external service); delay; I/O (response from external service) ⇒ formatting response to client ⇒ I/O (response to client).

While the chain of I/O and transformations got longer, it still consists of clearly visible chunks: I/O ⇒ transformation ⇒ I/O. The delay which separates two chunks represents the latency of receiving the response.

Let’s make the endpoint even more complex: build response from the data received from two different services. The processing scheme will look like this:

I/O (incoming request from client) ⇒ preparation of the request to external service 1 ⇒ I/O (request to external service 1); delay;

I/O (response from external service 1) ⇒ preparation of the request to external service 2 ⇒ I/O (request to external service 2); delay;

I/O (response from external service 2) ⇒ formatting response to client ⇒ I/O (response to client).

For convenience the whole chain is split at the delay points.

Notice that the scheme above describes a sequential version of the processing, i.e. we call first service and then second. The asynchronous version of the same processing may send both requests to external services at once and then wait for the results. When results are obtained, they are transformed into the response to the client. This change affects following things:

  • (reduces) latency of the processing,
  • the timing where each request to each external service is sent
  • how processing/transformation is organized

Nevertheless, these changes do not affect the high level view of the whole processing scheme, so we can ignore them for now.

It’s easy to note that regardless from how complex processing is, it can be split into the set of I/O ⇒ transformation ⇒ I/O chunks. The same is true for every endpoint and for the whole application. This observation results to very interesting implications:

  • Every backend application is a set of asynchronous, event-driven processing elements. Even if the application is written using synchronous API’s and waits for the results of every environment API calls, it does not change the result. For such application the environment does the transformation between asynchronous nature of the I/O operations and synchronous API calls. But best performance results could be achieved using asynchronous API to the environment/OS, such as, for example, io_uring API appeared in recent Linux kernels.
  • The tools used to write the backend application create an abstraction layer which hides internal structure from the developer. Such a layer requires some resources and affects performance. Better performance could be achieved using approaches closer to internal application structure. This mostly correlates, for example, with the results of Techempower benchmark — top performing frameworks for particular language/platform are ones which implement asynchronous event driven processing model.

Writing code for 30+ years and still enjoy it…