Why ‘multiprocessing’ Doesn’t Always Work

I have written quite a bit about removing the GIL from our system, and I see a lot of comments that say “Why don’t you just use multiprocessing.” and “didn’t you know that python is not safe for real-time work?”

I wanted to write a blog post to address those very questions so that I had some place to direct those people to. I also added a page on my wiki covering this topic, with the powerpoint slides from a talk I gave at PyGameSF: http://trac2.assembla.com/pkaudio/wiki/PythonForAudioWork

Multiprocessing is an incredible extension module thats solves many, many multiprocesing problems very well. Unfortunately, multiprocessing is a python module, and therefore runs in python and requires a GIL to be held. Every single operation in python requires the GIL to be held, because ref counts are being changed all the time.

Every time you access an object attribute, increment an integer, call a method, print, compile – *every time* that you evaluate a frame in python the GIL must be held by some part of C/C++ code. For pure python applications, this C/C++ code is entirely in libpython. For C/C++ applications that embed the python interpreter, this code could be in your application itself (if it was you’d know because you made the calls yourself).

Our problem is also somewhat unique – we write a C++ audio/musical plugin that the user loads into a mixer track in the host application. Modern audio hosts use a separate thread to process each mixer track, and wait for them each to finish before mixing their result and sending it to the soundcard.

Since you can load a separate instance of our audio engine into as many tracks as you want, we have to ensure that our engine does not use any static data to prevent the track threads from contending over a lock protecting that data (which almost always results in periodic CPU spikes). Our state-of-the-art audio engine does this very well, until you enable the libpython-based scripting engine, which uses static data like it’s going out of style.

Since libpython uses a single statically-linked object to store a lock that locks the entire library, you CAN NOT use the library from multiple threads, PERIOD. Before even making such a simple call as PyObject_HasAttr(), you have to acquire that one GIL, which means that no two threads will ever execute python code in your host application at the same time, ever.

That’s a showstopper.

We are doing a lot more than just making simple calls like PyObject_HasAttr (). We want to execute arbitrary python code written in our app’s source editor using PyObject_CallObject().

The accepted solution (from the BDFL) for this problem is to use ‘process migration’ which means creating a separate process each thread in your application that must execute python code. A process can be defined as a thread of execution with it’s own address space, so this makes sense. Each process would get it’s own GIL linked into it, and we’d all get on with our measly little lives.

This is the solution I chose. I create one daemon process from my C++ code for each audio engine the user creates, and make calls to PyEval_EvalCode() and PyObject_CallObject() from within it. This works for us because each audio engine is completely independent of the other and so does not share data (sys.modules, etc…). Our code is also simple enough that we only need the *python language* itself, and don’t require the use of complex extension modules.

I hope this makes some sense to all of you. The basic idea is that our C++ threads need to run python code concurrently, and this is not possible using the current libpython implementation. Even such a magical module as ‘multiprocessing’ requires the GIL to be held before it can dispatch work to different processes.

<<-- HOST PROCESS}  {CHILD PROCESSES -->

--| |--[proc task 1:python]
--===[multiprocessing: python] ==---[proc task 2:python]
--| |--[proc task 3:python]

How many GIL objects do you see in the above diagram? If you said three, you were close. But actually there is four when you including the libpython that is linked into the host app itself. Our audio threads still have to contend for this lock to communicate with the child processes.

In fact, in our case there are NO GIL objects! The host app no longer requires libpython and each daemon has only one thread running a libpython compiled without thread support! Sweet! Now our scenario looks like this:

<<-- HOST PROCESS}  {CHILD PROCESSES -->

----------- shm/semaphore ----------[proc task 1:python]
----------- shm/semaphore ----------[proc task 2:python]
----------- shm/semaphore ----------[proc task 3:python]

No more locks, no more problems!

Python and Real-Time Priority

Another comment that I frequently see is that “python is not safe for real-time work”. While this is of course theoretically true because libpython makes calls like mutex_lock() and malloc(), the concept of “safe for audio use” is completely different. All that matters for use is that there are gaps in the sound.

I took the plunge to see just how fast the CPython implementation was for *executing python code* (PyEval_FrameEx()), and as long as we don’t have multiple threads contending on the GIL, CPython hardly puts so much as a blip on our audio engine’s cpu usage. In an average use case, PyEval_EvalCode took up 0.4% of our audio thread’s CPU. That pretty much makes python 100% AWESOME for control-rate work within an audio thread.

Case closed.

By | 2010-04-17T18:22:00+00:00 April 17th, 2010|Uncategorized|3 Comments

3 Comments

  1. hcarvalhoalves April 18, 2010 at 11:04 pm - Reply

    Restating what we’ve been learning for some time on the web area: Python and Threads don’t mix.

    Seems like the best way to scale is to spawn multiple Python processes, and make sure you don’t have shared state. Not only with Python, but with pretty much everything, this is being the consensus (see the trend of the functional languages, with no shared state and lightweight processes). As long as you do that, the GIL doesn’t bother anymore.

    Cool article.

  2. PeterHansen April 19, 2010 at 6:15 pm - Reply

    The “python is not safe for real-time work” is a simplistic statement, possibly because those making it are not really familiar with real-time programming. Those familiar with it would make a distinction between “soft” and “hard” real-time.

    With “soft” real-time, the concept of “safe” simply does not apply, and of course Python is fully capable of soft real-time work that doesn’t require more throughput than your machine plus Python can handle together.

    With “hard” real-time, not only is Python “not safe”, but neither is standard Linux or Windows, or most other operating systems.

    So, obviously, you can either quite safely do real-time work with Python, or you absolutely cannot do it safely, and never the twain shall meet.

  3. pharmacy September 5, 2011 at 5:47 pm - Reply

    One of the reasons why I like visiting your blog so much is because it has become a daily reference I can use in order to learn new nice stuff. It’s like a curiosities box that surprises you over and over again.

Leave A Comment

53 − = 46