I just finished the final step in porting my python scripting engine to support true multiprocessing.

I originally took a leap of faith with stdio and system pipes to get the daemon working, but using system io functions aren’t reliable enough for realtime audio work. If the system gets bogged down, the stdio can block for too long and cause audio glitches.

So I decided to re-implement my blocking Pipe class using shared memory and semaphores. This was a little scary because I have never used shared memory and frankly sort of tuned out in CS201 when they started talking about semaphores.

But, I took the plunge. I took my time, wrote good tests, and clean classes. The tests passed, and the scripting daemon worked right away. The baby steps process looked like this:

1) Write RPC protocol layer on mac.
2) Write Pipe class,tests on mac.
2) Integrate, debug Pipe on mac.
4) Write Process class, tests on mac.
5) Integrate and debug proc code.
5) Write and debug accompanying windows proc code. (WAMMO, works!)
6) Re-implement Pipe class using shared memory/semaphores/mlock on mac.
7) Fill in the blanks on windows.

I’m sure there are wrappers out there to do all this stuff, but it doesn’t come up on the first couple of google searches and our project config is too fragile to add extra libs. I decided to just write the windows and mac wrappers from scratch to learn all the nuances of those system calls.

Anyway, on with the goods. The performance using shared memory and semaphores is outstanding on the mac. A quick profile shows that the overhead incurred by the scripting engine has not noticeably changed. This is where the interesting part begins, though.

Once you port a part of your code to run in a daemon, your profile stack data disappears into calls to ::read() or ::semop() ::pthread_mutex_lock(), or whatever. Profiling the daemon itself becomes easier, though, because all you see is the code for your module. I guess I need to get better at using the windows and mac profiling tools anyway.

I am able to see that the time spent marshaling the rpc protocol data is negligible in the profiler. So I will just assume that the shm/sem performance is *totally killer*, and something I don’t have to worry about.

I did, however, install a cpu % display calculated on the top layer above the scripting engine’s interface, so all of the above is displayed as a portion of the audio engine’s overall cpu usage. It sits happily between 0% and 1%. Awesome.

I can’t wait to post the C++ classes for creating and communicating with a child process using system pipes or shared memory. They are extremely clean and work well on windows and mac.