The GIL Goes West, Part 3: IPC with Shared Memory and Semaphores

I just finished the final step in porting my python scripting engine to support true multiprocessing.

I originally took a leap of faith with stdio and system pipes to get the daemon working, but using system io functions aren’t reliable enough for realtime audio work. If the system gets bogged down, the stdio can block for too long and cause audio glitches.

So I decided to re-implement my blocking Pipe class using shared memory and semaphores. This was a little scary because I have never used shared memory and frankly sort of tuned out in CS201 when they started talking about semaphores.

But, I took the plunge. I took my time, wrote good tests, and clean classes. The tests passed, and the scripting daemon worked right away. The baby steps process looked like this:

1) Write RPC protocol layer on mac.
2) Write Pipe class,tests on mac.
2) Integrate, debug Pipe on mac.
4) Write Process class, tests on mac.
5) Integrate and debug proc code.
5) Write and debug accompanying windows proc code. (WAMMO, works!)
6) Re-implement Pipe class using shared memory/semaphores/mlock on mac.
7) Fill in the blanks on windows.

I’m sure there are wrappers out there to do all this stuff, but it doesn’t come up on the first couple of google searches and our project config is too fragile to add extra libs. I decided to just write the windows and mac wrappers from scratch to learn all the nuances of those system calls.

Anyway, on with the goods. The performance using shared memory and semaphores is outstanding on the mac. A quick profile shows that the overhead incurred by the scripting engine has not noticeably changed. This is where the interesting part begins, though.

Once you port a part of your code to run in a daemon, your profile stack data disappears into calls to ::read() or ::semop() ::pthread_mutex_lock(), or whatever. Profiling the daemon itself becomes easier, though, because all you see is the code for your module. I guess I need to get better at using the windows and mac profiling tools anyway.

I am able to see that the time spent marshaling the rpc protocol data is negligible in the profiler. So I will just assume that the shm/sem performance is *totally killer*, and something I don’t have to worry about.

I did, however, install a cpu % display calculated on the top layer above the scripting engine’s interface, so all of the above is displayed as a portion of the audio engine’s overall cpu usage. It sits happily between 0% and 1%. Awesome.

I can’t wait to post the C++ classes for creating and communicating with a child process using system pipes or shared memory. They are extremely clean and work well on windows and mac.

By | 2010-03-24T14:43:00+00:00 March 24th, 2010|Uncategorized|9 Comments

9 Comments

  1. Peter R March 24, 2010 at 9:29 pm - Reply

    Thanks for writing about your python/c++/multiprocessor experience. It’s been interesting for me to read, as I am an aspiring rockstar/coder/engineer.

  2. Joseph Lisee March 25, 2010 at 1:12 am - Reply

    I am curious to know the difference between your approach and the ones provided in the python standard library: http://docs.python.org/library/multiprocessing.html

    They have shared memory, shared memory locks, pipes, etc.

  3. Patricio March 26, 2010 at 9:10 pm - Reply

    Joseph: I get this question a lot. I’ve written quite a bit in previous blog posts about why my problem can’t be solved with a python-based solution.

    The quick and dirty is that I have multiple audio threads implemented in C that need to execute python concurrently. The CPython implementation prevents this by requiring that each audio thread acquire the one GIL before making any calls to the CPython API. This essentially means that as long as there is one process there can only be one scripting engine.

    The only way around this is to spawn a separate process for each audio thread, link CPython into it, and make your CPython api calls from within that daemon process. Luckily, each audio thread does not share data, so this is possible as long as I want to take a month and write the code.

    I’ve just finished doing this.

  4. Joseph Lisee March 30, 2010 at 9:32 pm - Reply

    So essentially what you have are several pure C process each doing its thing and communicating back to a main process with your shared memory queue?

    So I guess what I was taking about was a more python driven approach of wrapping each of your pure C audio threads in something like ctypes. Then you can easily spawn each thread with multiprocessing, and use the existing python shared memory pipe/queues. This would of saved the time and peril of making your own shared message queue. Although I don’t about the possible performance implications of this approach, or if it’s really possible for you drive each of your C threads from python.

    Also, if your children are pure C, they don’t have to be a separate process at all. I had a bunch of pure C++ threads communicating with a single Python thread through an event queue. It was less painful to setup and it let the Python call into the threads afe C++ code when needed.

  5. Patricio March 30, 2010 at 9:38 pm - Reply

    Joseph: no, a multitrack audio sequencing host like Ableton live uses a seperate thread per audio track. In my model each thread uses it’s own process to execute python code so that they don’t all have to wait for the single lock. There is no way around this using pure python.

  6. Martin Vilcans March 31, 2010 at 8:55 am - Reply

    So, why you don’t use the multiprocessing module, which is the standard library’s way of doing this? It uses processes instead of threads to get around the GIL and supports shared memory.

  7. Patricio March 31, 2010 at 2:50 pm - Reply

    Joseph: because the multiprocessing module is a pure python solution. I have to run a python interpreter to use the multiprocessing code, which will still cause thread contention on the GIL.

  8. Joseph Lisee March 31, 2010 at 3:45 pm - Reply

    You said: “In my model each thread uses it’s own process to execute python code so that they don’t all have to wait for the single lock.”

    Multiprocessing lets you do just this from python. Spawn multiple *independent* python processes. It then provides shared memory based queues/pipes which let them exchange data without blocking each other.

    I think everyone is just trying to figure out where Python’s existing muli-process support let you down, because it seems to do everything you need.

    P.S. – The previous poster was Marty, not me.

  9. Patricio April 19, 2010 at 3:01 pm - Reply

    Joseph: I wrote a post to address the questions you are having, and to explain why I can’t use a python-based solution like multiprocessing. Go check it out, and tell me what you think!

    http://pkaudio.blogspot.com/2010/04/whey-multiprocessing-doesnt-always-work.html

Leave A Comment

5 + 2 =