Pricing

Exploring the Elegance and Performance of the Seastar C++ Networking Framework

This article primarily showcases the elegance of the Seastar framework's code and highlights some of its key features.

Introduction

Seastar is an outstanding C++ networking framework characterized by its low code volume, comprehensive annotations, and high readability. It has already begun to be implemented in various fundamental sectors. The framework boasts numerous appealing aspects that are worth exploring. This article primarily focuses on the elegance of the Seastar framework's code and some of its key features, as seen from a C++ programmer's perspective. It's worth noting that a Haskell programmer might view such code differently, potentially exclaiming, "Piece of sh**!" 

Learning different programming languages, working with various frameworks, and employing diverse methods to tackle problems can expand our horizons, break through boundaries, enrich our thought processes, and enhance our problem-solving efficiency. For instance, implementing a quicksort algorithm in Haskell would appear like this:

q_sort n=case n of

    []->[]

    (x:xs)->q_sort [a|a<-xs,a<=x]++[x]++q_sort [a|a<-xs,a>x]

 

Now, compare this to the C language version of "Data Structures". So, I'd like to reiterate that the elegance discussed in this article should be appreciated from the perspective of the C++ language, and not in comparison to other languages.

There's a quote from "The Pragmatic Programmer" that says, "Invest Regularly in Your Knowledge Portfolio." For technical professionals, remaining in your comfort zone for too long can turn your strengths into weaknesses. It's essential to maintain a keen awareness of technology and continuously stay in the learning zone. As time passes, you'll look back years later and realize how much you've grown. Maintain a learning mindset and make the most of your time.

The Beauty of Code

First, let's appreciate the elegance of this framework in simplifying complexity. Consider an EchoServer as an example (excluding the main function). 

The code is concise and easy to understand. It listens on port 1234, and upon accepting a connection, it sends a fixed string and then closes the connection. The future/promise pattern is employed here as a cohesive element to organize the entire code. The whole framework is also structured using this pattern. We will discuss the Future/promise pattern (which can be regarded as a widely-used Callback, just restructured differently) in more detail later. For reference and comparison, the company's TAF framework also incorporates this pattern. Additionally, the do_with and keep_doing functions that appear here are some functions that collaborate with this pattern (you can infer their purpose from their names), and if you're interested, you can refer to the source file future-util.hh (which includes conscientious and helpful comments).

To test it, please note that this is just a test example, so it is restricted to using a single thread, otherwise, errors will occur. If you want to use it formally, you should employ the sharded template to make the instance run on each core, which will be discussed later.

Using this framework can undoubtedly enhance development efficiency, but it's like a powerful Gatling machine gun with a built-in barrel explosion feature. If one's understanding of the framework and C++ is insufficient, the consequences can be immense. On the bright side, the amount of code in this framework is not extensive, and the comments are quite thorough, allowing you to read all the core code in a short time.

The Beauty of Performance

The Seastar framework is not only elegant in terms of code but also relentless in performance. Tests based on the Seastar httpd demonstrate that it already can process 7 million requests per second on a single node (server), with breaking through 10 million just around the corner. It appears that you can have the best of both worlds (development efficiency and operational efficiency).

Hardware Configuration

CPU: 2x Xeon E5-2695v3

Network Card: 2x Intel Ethernet CNA XL710-QDA1 (Transfer rate: 40 Gbps, one network card per CPU)

The application layer code developed using Seastar is isolated from the underlying network mode, and the application doesn't need to be compiled. You can choose to use the Linux TCP protocol stack or the Seastar native protocol stack (user-space TCP protocol stack, zero-copy, zero-lock, and zero-context-switch, directly accessing the physical network device through DPDK. DPDK processing a packet takes no more than 80 CPU clock cycles).

The Beauty of Architecture

For multithreaded programs, the theory is perfect, but the reality can be harsh. The following picture illustrates the difference between theory and reality.

Seastar's architecture is built upon proven experiences, with each core running a thread (referred to as a shard). Each thread running on a core is responsible for processing its own data and scheduling asynchronous events through the Promises pattern. This design achieves a shared-nothing effect, thus eliminating locks between multiple cores and making theory align with reality. Of course, this design cannot entirely avoid interactions between cores, which will be discussed later.

Below is an official illustration provided for your reference.

Let's take a look at the key code. The main loop that each thread runs is reactor::run(), which is a typical reactor event-driven pattern. This module is called the engine.

In an application we write, such as httpd, it runs on different cores through the sharded template. This module is called the service.

The parallel_for_each loop runs smp::count (which can be reset to a number less than the physical core count through startup parameters) times, and the service is created on each core instance through smp::submit_to. This code is also beautiful.

Here, the CPU can be seen as a factory, each core as a workshop, the Reactor as an engine, and our written service (such as httpd) as a machine tool. The engine drives the machine tool to operate, and each workshop does not interfere with each other.

Let's take a look at httpd, limited to 3 workshops (shards). You can see that there are 3 identical engines processing the same thing without interference. Here, the underlying layer uses SO_REUSEPORT (note that it requires kernel support for version 3.9 and above).

Future/Promise

This is the glue of the framework. A Future is a data structure representing an unresolved result and can bind Continuations (understood as callback functions). A Promise is the provider of this result.

It's important to emphasize that the implementation of Promise in various frameworks, including the standard library, has different details, so be aware of the distinctions.

Let's examine an example. In line 11, we create a promise, and its future is bound to a callback function. When the promise is assigned a value in line 16 (e.g., the asynchronous result returns from the server), our bound callback function will be scheduled. For instance, this differs from the implementation in the company's TAF framework, where the callback function in TAF will be called immediately upon setting the promise value. In Seastar, there is a unified scheduling process (it is scheduled after line 19). Likewise, the sleep function later on will also return a future and bind a callback function.

Carry out the results, pay attention to the order of sequence.

Although it is a small mechanism, the benefits it brings are immense. Throughout the entire framework, this mechanism connects various modules into a cohesive whole. Ugly callback code is pervasive, so there's no need to provide more examples.

Programs must be written for people to read, and only incidentally for machines to execute.
— Harold Abelson and Gerald Jay Sussman 

Message Passing

We mentioned shared-nothing earlier, but communication across cores is essential. For cross-core communication, the framework employs lock-free queues. To achieve lock-free status, multiple reads and writes cannot occur simultaneously, so a queue is needed between any two cores. For a 16-core CPU, 16*16-16 queues are required.

Definition and initialization of the queue

Queue polling processing

Definition of smp_message_queue

Here we can see the attention to memory cleanliness and the craftsman spirit. There are two pieces of data: the sender's statistics (used by Acore) and the receiver's statistics (used by Bcore), with another data structure inserted in between. The author explains that this is to prevent the CPU's prefetcher from causing A core's Cacheline and B core's Cacheline to load the same content. To elaborate, let's consider what would happen if multiple loads occurred. Using Intel processors as an example, if prefetch fails, data block 2 (already aligned to the cacheline size of 64 bytes) only exists in B core's cacheline, and its state may be E (Exclusive). If there is no other data structure and A core prefetch is successful, data block 2 also exists in Acore, and the cacheline state may be S (Shared). Writing to the S state requires operating on the InvalidateQueue, which inevitably causes a performance burden at the hardware level. If you want to know more details, you can read "Memory Barriers: a Hardware View for Software Hackers". Note that it is not necessary to delve deep here, as the implementation varies across different CPUs.

Epilogue

Returning to the phrase from "The Pragmatic Programmer": "Invest Regularly in Your Knowledge Portfolio", a key question is how to persist in doing so? In fact, this is not a problem at all. If learning something feels as exciting as reading a novel like "Demi-Gods and Semi-Devils", would that be an issue? If your learning process feels tedious, it's not a problem either, because you might be on the wrong path. Moreover, during the learning process, there will undoubtedly be some things that cannot be fully understood. Learn to swallow the dates whole, and don't hesitate to put it aside for a while. When you come back later, you may have a different understanding. Learning is a continuous process with no end, but in the process, you will gradually become stronger.

Latest Posts
1Case Analysis: How CrashSight Captures and Analyzes Game Crashes Caused by FOOM (Foreground Out of Memory) What novel problems and challenges does Tencent Games' new crash analysis system tackle?
2A review of the PerfDog evolution: Discussing mobile software QA with the founding developer of PerfDog A conversation with Awen, the founding developer of PerfDog, to discuss how to ensure the quality of mobile software.
3Enhancing Game Quality with Tencent's automated testing platform UDT, a case study of mobile RPG game project We are thrilled to present a real-world case study that illustrates how our UDT platform and private cloud for remote devices empowered an RPG action game with efficient and high-standard automated testing. This endeavor led to a substantial uplift in both testing quality and productivity.
4How can Mini Program Reinforcement in 5 levels improve the security of a Chinese bank mini program? Let's see how Level-5 expert mini-reinforcement service significantly improves the bank mini program's code security and protect sensitive personal information from attackers.
5How UDT Helps Tencent Achieve Remote Device Management and Automated Testing Efficiency Let's see how UDT helps multiple teams within Tencent achieve agile and efficient collaboration and realize efficient sharing of local devices.