Coroutine, C language implementation

Coroutine, C language implementation

Coroutine, C language implementation

Interestingly, this is not a popular science article.

What is a coroutine

By convention, this article should talk about what a coroutine is, rather than anxiously start talking about the details of the implementation, not to mention the fact that the implementation of a coroutine is not very complicated. But there are too many articles about what a coroutine is. There are some in one language, some are false, and even more random. What is in the word is like a cloud, so there is no need to say more; what is rumored to be false, although there is something wrong, it is not a big evil; the rest is made up and made astonishing, just ignore it.

But what exactly is a coroutine? Some people regard coroutines as lightweight threads, and some people regard coroutines as variants of callbacks. The former is indeed biased. I can t find any mistakes or omissions in the latter. After all, most of the coroutines can be rewritten with callbacks, but from my own understanding, the coroutines can also be regarded as being able to save the context and allow execution. But a non-preemptible execution unit. In the early days of the operating system, coroutines were used to simulate concurrency, which was later replaced by preemptible threads. This is also the reason why the term lightweight threads is considered biased.

In any case, whether it is a callback or a non-preemptible execution unit, in essence, it will not help performance much. It is very interesting that if I say that coroutines can greatly improve the efficiency of the program, some people may believe it; if I say that callbacks can greatly improve the efficiency of the program, some people will certainly laugh at me as stupid. What's the difference between going three and going four, going four and going three.

Duff's device

Duff's device refers to the ability in the C language to interleave switch and other control statements to manually unroll the loop. A
typical Duff's device is shown below.

    case 1:
        if(j == 0){
        case 2:
            if( b == 1){
                case 3:

In essence, Duff's device is more like a kind of syntactic sugar. Of course, this seems useless, but this is in C


    static int fd = 0;
    void jobs(){
        static int i = 0;
        case 0:
            addr = get_addr(); 
            fd = connect(addr);           // 
            i = 1;
            return ; 
            case 1:
                data,len = read(fd);
                printf("%.*s ",len,data);// 
                i = 2;
                return ;
                case 2:
                    len = write(fd,data); // 
                    i ++;
                    return ;

    tnet_worker[0].call     = jobs;
    tnet_worker[0].resume    = jobs;

This is basically the prototype of a simple coroutine, which may be quite different from the coroutine that some people imagine, but it is a pity that this is the so-called coroutine.

Static is used to save the context, and return is used to give up logically non-preemptible execution functions. This has covered the key elements of the coroutine.

What's more interesting is that if you want, you can replace them with function callbacks in the form of do-connect, do-read, and do-write. It did not make our performance better or worse, but it did make our code logic clearer.
At this point, we can create thousands of tnet_workers to handle each connection. This may be like a thread, but we all know that it is completely different from a thread.

Duff's device is good, but there is a serious problem. Local variables cannot be saved across functions. Although I can create thousands of
coroutines, they are sharing the same stack. It can also be seen from the above that I use a static local variable to save the context of the coroutine. This is no problem for the per coroutine per entry, and it is a bit powerless for the multi-coroutine single entry. Simon Tatham provides a C code macro based on Duff's device in the article

, But also wraps up a single entry, context preservation problem. But essentially static local variables are still used. This leads to the use of this macro to make more assumptions, and I will not do in-depth analysis here. (Btw, I like this macro very much, it can indeed make the logic of the code more clear, especially for C code).

Based on this consideration, we need to allocate a separate stack space for each coroutine. On the other hand, we need the ability to jump to each other in different stacks, which is beyond the ability of traditional control statements. The former, we still need to discuss the latter, the latter is very simple, students who are familiar with the C language know that standard C provides

[sig] The longjmp/setjmp series of functions provide the ability to save context and jump across functions. This is very simple, so I won't do further research.


Let me start with three very interesting facts:

  • Most coroutine implementations use ucontext
  • ucontext is a deprecated function, which cannot be found in the latest POSXI standard.
  • POSIX recommends using pthread as a possible alternative.

    ucontext is born with its own stack space, and it is also born with mutual jumps. So, it is indeed relatively simple to implement, but considering that the modified function has been abandoned, I will not introduce it too much.


__sigaltstack - set and/or get signal stack context__



Interestingly, this also provides us with an opportunity to obtain the stack. In other words, the creation of a coroutine with independent running stack space can be as follows:

  1. Provide a new signal processing function execution stack.
  2. Set the execution function of a certain signal and imply execution on the new stack
  3. Send the signal and wait for the end of the signal processing execution

    1. Use setjmp in the signal execution function to get the context of the stack and take it for yourself.
  4. Restore the processing function of the signal
  5. Restore the signal processing function stack

    Now, we can successfully jump to our own stack by longjmp. With Duff's device, we can implement a relatively complete coroutine. If you have a good understanding of signal processing, the specific implementation is relatively simple. It's not too much involved here.


I only recently learned that some languages use threads to simulate coroutines, which I find very interesting. Coroutines have such magical power. This reminds me of anecdotes in high school. Accompany your classmates to buy milk tea, a cup of red bean milk tea, no red beans.


Speaking of the implementation of coroutine for a long time, does coroutine really solve any program performance problems. I have no intention to promote these wonderful techniques for realizing coroutines, but through these realizations, we can clearly and clearly see what coroutines are. It is not a panacea for all diseases, from the perspective of machines. In other words, this is just an advanced jmp instruction.

If one day, you have to use so-called coroutines to reduce the cost of thread switching, then you can ask yourself, do you really need so many threads. Threads are never created by themselves. It is people who create threads.

Most people nowadays are accustomed to designing and developing from a human point of view, but sometimes, from a machine point of view, there will be some different gains.

C Co-Processing Practice-Simple turn-based game

Consider a simple turn-based game where the protagonist A and NPC are in a battle. We assume that the rules of the game are as follows:

  1. The protagonist A initiates an attack, and then the NPC attacks again. Circulate in turn until one of them has a blood volume below 0.
  2. Both the protagonist and the NPC have an initial attack damage of 10.
  3. Both the protagonist and the NPC have dodge rate and crit rate attributes.
  4. Successful dodge is not harmed.
  5. Once critically hit, the damage is doubled.

The implementation of a coroutine is as follows (here I used the pcl library):

#include "pcl.h "

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

struct attacker_st{

    uint64_t        damage ;
    int64_t         HP     ;
    coroutine_t     coro   ;

    uint8_t         critical_rate;//percent
    uint8_t         dodge_rate;
static struct attacker_st attacker_buda = {
    .HP                 =   100,
    .critical_rate      =   80,
    .dodge_rate         =   50,

static struct attacker_st attacker_duzi = {
    .HP                 =   100,
    .critical_rate      =   20,
    .dodge_rate         =   20,
uint64_t compute_damage(uint8_t critical_rate)
    uint32_t r = rand() % 100 + 1;
    if(critical_rate > r){
        return (20);
    return (10);

int dodge(uint8_t dodge_rate)
    uint32_t r = rand() % 100 + 1;
    if(dodge_rate > r){
        return (1);
    return (0);
void Attacker_BUDA(void *args)
    while(attacker_buda.HP > 0){

        attacker_buda.damage = compute_damage(attacker_buda.critical_rate);

        if(attacker_duzi.HP <= 0){//duzi is GG,break the loop

            attacker_buda.HP -= attacker_duzi.damage;

void Attacker_DUZI(void *args)
            attacker_duzi.HP -= attacker_buda.damage;
        if(attacker_duzi.HP > 0){
               attacker_duzi.damage = compute_damage(attacker_duzi.critical_rate);

int main()
    uint8_t i ;
    srandom((long int) &i);


    attacker_buda.coro = co_create(Attacker_BUDA,NULL,0,32768);
    attacker_duzi.coro = co_create(Attacker_DUZI,NULL,0,32768);


    if(attacker_buda.HP >= attacker_duzi.HP){
        printf("the Winner is Buda\n ");
    }else {
        printf("Shit,Duzi Win\n ");