Sep 25, 2023

Codename Tulip: The Making of Async

Thanks to everyone who came to my PyCon CZ talk last Saturday. It was probably the most beautiful venue I've ever given a talk in, kudos to the organization for that.

Here’s a loose transcript of the talk, with the slides.

Today I’m going to talk about asynchronous programming. I promise you that after this talk, you’ll be able to write software under this paradigm much more effectively, thanks to a better intuition about it.

This will help you discuss async with more confidence, write it correctly in interviews and on the job, and help you decide whether it makes sense to use it. In other words, profit, profit, profit.

Does that sound cool? All right.

But first, can you raise your hand if you know anything about this async library in Python?

Oh, wait, there’s a typo on my slide.

What about now, have you heard of this one? I’m not making this up, this was a real library, part of the standard since almost the beginning. My guess as to why almost nobody has heard of it is that it was an absolute pain to work with.

So much so, that in September 2012, someone went as far as suggesting to make radical changes to it, in a post titled Asyncore: included batteries don’t fit.

It wouldn’t have mattered if Guido van Rossum hadn’t seen it. At that time, Guido had been doing a lot of async stuff for the ndb library that connects to Google Cloud Datastore. He had faith that Python had evolved, and a new and better way of doing async was possible.

And so, he opened his laptop and created a new folder, titled “Project Tulip”.

Alphonse Mucha – Christmas in America (1919)

My name is Alvaro Duran, I’m a Senior Software Engineer at, and this is Codename Tulip. My goal today is to show you the making of the semantics for concurrency in Python, and to help you understand the design strategies underneath it.

Pause for a moment to consider how you get coffee at Starbucks. At the counter, you can choose to buy products that you can get right after you’ve paid for them, such as cookies or a sandwich, and products that you need to wait for, namely coffee.

A cashier takes your order and, if you want coffee with it, she will mark a cup with the specifics, and your name on it. That cup ends up in a queue, and you pay. Eventually, a barista will pick up that cup and will fulfil the order written on it, while the cashier has already moved on to attending to new customers.

That is concurrency: an algorithm for doing multiple things at the same time.

Can you see the two queues of people? Queueing decouples order acceptance and delivery. It doesn’t make things any faster. But at peak times, this separation helps Starbucks sell more coffee. The cashier would be very inefficient if she took an order and went straight into making the coffee before attending the next customer. People waiting in the line would get frustrated and leave.

But as absurd as that sounds, that’s how most software, even nowadays, is executed.

Ryan Dahl, Introduction to Node.js

In 2010, two years before someone suggested changes to asyncore, Ryan Dahl had introduced his new project, Node.js, claiming that software engineers were “doing I/O completely wrong”.

I/O refers to those moments when the computer communicates with “the outside”. 

Ryan Dahl, Introduction to Node.js

I use scare quotes because “outside” can mean many things, but since CPU cycles are correlated to response time, it is as if the computer communicates with devices that took seconds to respond, and devices that took years.

In order to get information from devices at the bottom, the program blocks, sitting idle until the data gets back. Getting information from the devices at the top happens more or less immediately, and the program doesn’t block.

Modern programming languages like Python do not acknowledge this difference. Languages are all about metaphors, and yet, the metaphor for accessing “the outside” is incomplete.

What Ryan Dahl was getting at was that Javascript, or Python for that matter, didn’t have a semantics for concurrency, a consistent way to write software that does something else while waiting for the result of a blocking I/O operation.

It’s easy to forget sometimes how Python guides and constrains the way in which engineers write software. Consider how simply a semicolon expresses ordering: readline won’t execute until open has finished.

Or consider how the return statement allows functions to send a value into the void, so that Python can catch it and pass it, along with the control flow, to whomever called that function.

Exception handling provides a systematic way to unwind callers and callees when something goes wrong.

Context managers build on top of exception handling to give us an easy way to pin down the lifetime of resources within the dynamic lifetimes of their callstacks.

Fundamentally, these features of Python work in camouflage. Unless you’re a beginner, you don’t pause to think about what the language is doing.

Unless you’re writing a thesis, you don’t consider all the alternatives for how things could have turned out differently, or that any of this stuff was once upon a time subject of debate, before what used to be commonplace was ultimately considered harmful.

Python has been refined over more than 30 years and fits together to provide a powerful scaffolding for writing software.

That is, until you want to do things concurrently. In that case, you’re more or less on your own.

Concurrency is very much around us in real life. But Python pretends that things always happen One Damned Thing After Another.

For a long time, lacking semantics for concurrency wasn’t really a problem. It concerned only those building Operating Systems and Database Engines, people willing to endure long learning curves associated with complex frameworks. Nobody else cared about concurrency.

Now everybody cares. Why?

Moore’s Law. For the past 50 years, software engineers have taken for granted the improvement of their programs’ performance, thanks to faster chips. Doing things concurrently wasn’t worth the effort when it took just a few years for hardware to become twice as fast.

Such a free lunch is over, and our computational ambitions will now have to be satisfied with responsive and scalable software.

The most direct way to solve concurrency is to use threads, which are multiple instances of your program under the supervision of some central process.

However, switching between threads requires a costly trip to the operating system scheduler. If we want to handle things like 10 thousand connections simultaneously, using one thread per connection will grind the server to a halt.

Multithreading, in Starbucks, would be as if employees were taking the role of both cashiers and baristas. Small cafes can manage with a few people doing everything. But, at scale, the context switching that that organisation would impose on the employees is as costly for humans as it is for computers.

Switching between threads is a bad idea. What about doing concurrency on top of a single thread?

That is not unlike how an Operating System executes programs on top of a CPU that has no notion of multitasking.

If the orange arrows represent a program running on the CPU, then special instructions called traps are read by the CPU as “suspend the program you’re running, and give control back to the OS”. 

The Operating System then resumes, decides which program to execute next, and gives control to it. The next program will hit another trap, giving this flow its distinctive Run-Trap-Run-Trap-Run-Trap dynamic.

This design allows engineers to programmatically decide on which places our program is going to voluntarily yield control to the OS.

It’s like a relay race, each runner taking turns with the baton of control, always passing it to, or receiving it from, the Operating System.

This strategy allows a single CPU to run only one program at a time, yet conspire with the OS to give us the illusion that the computer is running on top of hundreds of chips.

Making the Async in Python is a similar strategy: creating a system that runs a single thread of execution, but in such a way that coroutines take turns to support concurrency.

Bob Nystrom – What Color is Your Function?

Wait, what’s a coroutine? The truth is, I don’t know. No one knows. It means different things to different people.

But all of its definitions share a few themes that could help us anchor the concept.

  • First, coroutines are independent, because they have their own internal state and control flow.
  • They are also switchable, because they can voluntarily pause and be resumed.
  • And they are callstacks, because they retain some memory of where they are paused, allowing other coroutines to communicate with them bidirectionally.

For a long time, Python didn’t have coroutines the way it has classes or functions. But the language allowed for something that checked a few of these themes, so that it could be repurposed as Python’s coroutines.

I’m talking about generators. Generators are functions which, when called, don’t return a value, but an object, to which you can call next to obtain values from it. 

Because that’s something that iterators also do, most tutorials on generators focus on its iterative aspect, and so they are defined as list comprehensions that can be evaluated lazily.

Importantly, and that’s something other iterators can’t do, generators can be paused, because they are repeatedly giving control back to their caller via the yield keyword.

So, internal functions with their own internal control flow, which can be paused from the inside and resumed from the outside. That sounds like Independent and Switchable!

Well, they’re also Callstacks. We can communicate with the inside using the send method. But before we can use it, we need to first call next to it, what’s called priming the generator, so that execution advances to the yield statement.

The code this slide refers to can be found on Luciano Ramalho’s Fluent Python

Notice that the yield now appears on the right hand side of the assignment operator. The variable average is what the generator gives to whoever is calling it. The variable term is what the caller sends inside. And yield is the checkpoint.

Because Python executes “from right to left”, the assignment of “term” hasn’t happened yet. That’s when the baton of control changes hands.

Luciano Ramalho’s Fluent Python

But what if I want to chain more than one generator? After all, professionals reuse functions, like in here, where I want to assign the average of a list of numbers to keys on a dictionary.

This wouldn’t work, because yield would resume right after the first item has been sent into averager. What I need is something that suspends the grouper until averager is done receiving items.

Luciano Ramalho’s Fluent Python

That is precisely what the yield-from keyword does. For as long as the averager is running, yield-from sends into it whatever value is sent into the grouper, and yields to the grouper’s caller whatever value is yielded by the averager.

One last trick. Remember when I had to call next on the averager to be able to start sending values into it? If you are familiar with decorators, you’ll quickly realise that that’s something that a decorator can deal with. But not only do I get a generator already primed, by using an appropriate naming, I can flag this generator as a coroutine and highlight its intended behaviour.

With decorated generators and the yield-from keyword, Guido had everything he needed to build Project Tulip, a new async framework in Python that would ultimately become part of the standard library under the name asyncio, and the coroutine decorator/yield-from keyword pair are the inspiration for what would eventually become the async/await syntax.

In an earlier draft of this talk, now is when I used asyncio and aiohttp to build a concurrent web server. But the more I dove into decorators, and keywords, and fancy tricks, the more I felt like the character of Ian Malcolm in Jurassic Park.

Was this a good idea to begin with?

First, it’s confusing. If I’m having a bad day, I can mix and match generators intended as iterators, and these generator-based coroutines, using one where I’m supposed to use the other.

Second, I can introduce subtle bugs if I’m not careful. It takes the removal of the last yield statement from a generator-based coroutine to turn it into a normal function, where none of this next, send mechanism applies, and wreak havoc.

And third, that’s as far as I can go with coroutines. Classes have magic methods, metaprogramming, powerful tooling that expert engineers manipulate to exert their will into the computer. Coroutines have none of it, and that’s limiting.

The Difference between Yield and Yield From

The history of generator-based coroutines is the history of Guido’s reluctance towards adding new keywords in the language. Adding keywords is like inventing a new piece for a board game—both radical and backwards incompatible. It’s understandable that Guido was wary of it.

The Difference between Yield and Yield From

From the python-tulip google group, it transpires that he just didn’t see good enough reasons for a new syntax, and ended up repurposing old syntax in new ways, much like human languages do. After all, the word “gay” used to mean “happy” not that long ago.

Inspired by Dave Beazley’s Fear and Await in Async

But reading that asyncore post made Guido aware that there was a problem with concurrency in Python. And so, he looked around, and asked: “what do you guys need?”. And all that he could find was different async frameworks trying to be compatible with one another, spending time and resources on creating adapters to other frameworks’ event loops.

Guido’s solution to this was asyncio: The One Event Loop To Rule Them All. But that’s the view from the perspective of framework implementers.

Inspired by Dave Beazley’s Fear and Await in Async

The view from the end user is quite different. They seldom combine Twisted with Tornado, much like web developers almost never combine Django and Flask.

Guido thought it was a community need. But in reality, it was async frameworks’ implementers, trying to increase adoption by making it easier for other frameworks’ users to migrate to their framework.

What implementers need is interoperability. What users need is APIs.

Inspired by Dave Beazley’s Fear and Await in Async

You can look at asynchronous execution as a stack of layers that communicate with one another. On top, we have coroutines, with their async/await syntax. Under the hood, they leverage generators. The reason they do this is because the yield keyword signals the release of control to the synchronous function that’s running the show, be it the event loop or something as simple as a normal function sending values back up.

Inspired by Dave Beazley’s Fear and Await in Async

But that’s how an Operating System is designed to work. Python runs a program that’s interpreted in C, which makes use of system calls and trap instructions to release control repeatedly to the kernel.

Separating coroutines from the generators that implement them is like separating user mode functions from the kernel that executes them.’s Programming Language Inventor or Serial Killer quiz

But Alvaro, are you really saying that Guido was wrong? That in refusing to do the painful job of adding new syntax to Python, he enticed a whole community of engineers to treat his hacky workarounds as Pythonic gospel?

No. I think Guido took the right approach, and that’s compatible with thinking that using generator-based coroutines should be considered harmful.

M C Escher – Drawing Hands. 1948

In Making the Async, Guido paved the way to a new syntax. Asyncio reshaped the semantics of concurrency in Python. Taking the shortcut of just adding new syntax would’ve been corrosive to the language, as taking shortcuts always is.

Because of this reshaping, asyncio is probably not the best solution for concurrency anymore. Its support for generator-based coroutines forces it to make trade-offs all the time.

Asyncio is dying of the weirdest illness of all: past success.

Trio framework

New libraries like Trio have embraced the async/await syntax, so much so that you can’t use generator-based coroutines in Trio.

That allows for innovations that asyncio has tried to backport, but not successfully in my opinion. In Trio, you can use asynchronous context managers that isolate blocking I/O calls inside a with statement.

Alice in Wonderland (2010) – Walt Disney Pictures

If you’re looking for something cool and insane to do with your free time, async is on the top of my list.

And, of course, if you have any questions, you can contact me on Twitter or Linkedin. I look forward to coming to PyCon CZ on many occasions in the future. Enjoy the rest of the day.

Featured articles
Generating SwiftUI snapshot tests with Swift macros
Don’t Fix Bad Data, Do This Instead