Lessons Learned

Having moved to a C++ project with strict performance requirements, we’ve spent a ton of time on build config and memory related issues. Not only is the majority of our effort not spent on writing features, but this “wasted” time also happens to be the most agonizing. Being a hopeless idealist and having nothing but subjective and difficult to argue cases to present to my boss, I thought I’d just try to put it all on paper to remember for the next go around.
Developing a modern audio plugin with good platform support provides plenty of code management challenges. The plugin formats we support are AU/VST/RTAS, the platforms are Windows, Mac/ppc/intel, and all 32/64 bit. No one has ever bothered to count the total, but I do know that it amounts to a big pain. We use XCode 2.5 and Visual Studio 2005, and have incorporated qmake to use qt for the gui into both. Our main targets are a standalone app or a plugin, which in turn loads a separate plugin for each of our products that you’ve purchased. This means you have to maintain several build configurations over several platforms, and they all behave differently.
Lesson 1: Separate features and config
Clearly define your problem domain, and try to separate your build and deployment environment from the development environment as much as possible. For example, our problem domain is to implement a high-performance sample playback engine and accompanying UI, so adding some new graphical classes and menus should not affect the build config, and visa-versa. When you write in C/C++, you will typically define config macros on the command line, add and remove files to be compiled, etc, and all of this stuff will change on each platform. For example, when I add a QWidget subclass, I have to add the files to XCode, add an entry in the qmake project to generate the meta source, add the meta source to the XCode, and do it all over again for Visual Studio.
Conflicting headers has also caused us plenty of problems. Our primary third-party libraries are juce, Qt, and python. Python and Qt conflict over the “check” symbol, and juce and Qt conflict over the T macro and others. The juce library is nice enough to include “using namespace juce” in it’s headers, so we get conflicts with just about everything, including system frameworks like Carbon. To solve this problem I had to find the correct order of #include and #undef statements to get all the libraries to play nice together. Don’t even bring up windows.h. Finding a solid scheme for including the fundamentals is very important, and you should never have to worry about it in your feature code.
In a nutshell, You should write features with 0% brain power spent on how your features will affect the build config.
Lesson 2: Unify your build environment
Managing your build configuration is a pain in the butt. qmake does a really good job of flattening the tool chain, but as long as you are using more than one compiler you are going to have to have conditional variables built into the build config. This is almost always the case when you support more than one platform.
If you can manage to flatten the config to a point where you do not need to separate changes for your different platforms, you’ve taken a huge step. Adding a new library to your code will cause a change to the config, so the cleaner the libraries you choose, the better. Qt, sip, and PyQt are examples of libraries that add little to no compilation and linkage config overhead to a project because their lack of dependencies allow them to be relatively self-contained. The sip code actually compiles with no special flags.
Lesson 3: Write modular code
I know all of this stuff sounds elementary, but it’s all very important. The more black boxes you write, the less they will change. Our project consists of a few key components; the audio engine, the gui, the product plugins, and the target exe/plugin (of which there are many). Trying to meet our stupidly tight initial deadlines, our project manager originally had everything compiled into a single target for each exe/plugin. Granted, this would have fixed the symbol visibility problem that we are having right now, but it added significant compile time and the code behaved differently everywhere.
It doesn’t really matter how much intertwangling there is within the bounds of each component (your boss is going to assign those bugs to you anyway and you’ll just go fix them), but it does matter that your major components are well separated. We aaaalmooooost nailed it in our project by creating engine, gui, and product libraries, but we unnecessarily mixed the headers and instantiated objects from each in the target exe’s. A better approach is to provide a single header for each library that includes only pure virtual classes and some global factory functions. These headers should not define any types, and should not include each other.
I Tilleg
I like using our project as an example for project management topics because it has some relatively tough requirements. The audio engine must be written in C++ because performance is top priority, and the gui code could be written in python. Given that the engine is as small as possible and 100% independent of the other components, we could have written a small layer to move all of the application code into python. That way we could have added and tested new features with little worry to causing crashes and compilation/linking related issues.
As a result, the two most interesting problems that I would like to solve are:
1) A 100% self contained AU/VST/RTAS wrapper with PyQt support.
For the plugin wrapper, a flexible engine => gui interface would be key for component communication, but also the build config for the compiled elements would need to require as few compiler options as necessary to make it easy to be compiled as a local target as opposed to linked as a library. This way, the developers would have total access to the code once it was included in their project and would not have to deal with any overhead related to integrating it into their projects. Including updates to said code could be tricky, but I’m convinced that the changes could be made small enough to remain manageable.
2) A concurrent python interpreter.

A truely concurrent python interpreter would allow the language to be used in a realtime dsp engine while simultaneously used for the user interface. This has been brought up before and is a point of contention for the python project since the GIL is really really tough to remove. I suppose if I was able to provide a patch with a build option to at least provide a concurrently capable interpreter the language could be used in very specialized multithreaded cases. This is a pretty specialized requirement though, so maybe that’s good enough? We just wanted the language itself, and I could have easily patched the sip and PyQt extensions to accommodate the change.

By | 2008-09-04T16:29:00+00:00 September 4th, 2008|Uncategorized|3 Comments

3 Comments

  1. Troy Melhase September 10, 2008 at 1:54 am - Reply

    I’m still trying to get my head around why you need to have the DSP code concurrent along with the GUI code.

    Can’t the DSP portion be written as an extension module, and written in such a way as to be non-blocking?

  2. Patricio September 10, 2008 at 4:34 am - Reply

    Hey, look at that, I got a comment! I’ll try to break this down as well as possible, so bear with me while I repeat parts you already understand.

    PROBLEM
    ======

    Here’s the short answer: The DSP code has to run unhindered by the gui code because in order to run reliably the DSP thread isn’t allowed to make blocking calls, and you have to acquire the GIL (which is a blocking call) to make any SINGLE call to the CPython API. In a sequencer app like Ableton Live, each instance of our instrument plugin is created in a new track which gets its own thread. That means we have multiple audio threads obtaining serialized access to the CPython API, which breaks the rule. This causes intermittent CPU spikes and glitching when running at a very low latency with two or more tracks.

    Here’s the long answer: The problem is that CPython is widely written using global singleton-like variables. One of these global singletons is the global interpreter lock, which is nothing more than a a regular old critical section object. The reason Python can never run concurrent threads is that every single meaningful call to the CPython API is written without regard to threads, assuming that the calling code has already acquired that one lock. As with any code that uses critical sections, access to it is serialized, which means that only one thread can run it at any given time. The point here is that on a machine with a million processors running an app with a million threads pointing to one critical section, the effective processing power is that of a single processor.

    As a side note, there are some C extensions and CPython code that release the GIL while making system calls or other blocking operations that do not require the python API. An example of this is calling read() to read data from a file descriptor. When read() returns, the calling code will re-acquire the GIL so that it can continue making more CPython calls. Also, the interpreter will automatically release the GIL at a pre-determined interval to allow some other threads to run, but this doesn’t assume concurrency since the lock is still in the way.

    None of this is that big of a deal for most applications, but it is a total show stopper when you need to make CPython calls from an audio thread and *any* other thread at the same time. This means audio/gui, or audio/audio, or both.

    Consider the following code from an audio thread:

    while(1)
    PyObject *pyValueObj = PyDict_GetItem(pyDictObj, pyKeyObj);

    And the following similarly inane code from some gui thread, another audio thread, or whatever:

    while(1)
    PyObject* pyValueObj2 = PyDict_GetItem(pyDictObj2, pyKeyObj2);

    The application will eventually crash if you run those two threads at the same time because the loop doesn’t acquire and release the GIL during each iteration. The reason this is a show stopper for us is that you are allowed ZERO blocking calls in an audio thread. The threads are allowed zero shared resources, and only contain direct unhindered access to non-shared resources. The function of sharing data to manipulate an audio thread is very complicated and doesn’t fall within the scope of this description.

    PROPOSED SOLUTION
    =============

    The shot-in-the-foot design decision that was made way-back-when was to use a single global variable to store the GIL, and subsequently there is another one for the current interpreter. Instead, each interpreter should store it’s own GIL so that you could make calls and run scripts on each interpreter without acquiring the GIL. All you’d have to do is make sure that you never screwed with objects that existed on an interpreter from the wrong thread. The mechanics of this get complicated, but it’s totally doable. The problem is that all the bloody extension modules are written with zero regard to threads as well.

    The benefit is that if you wanted to embed the most awesomest language ever into your audio app, you could do it and say “screw it” to all the funky extension modules.

    If you want more info look at this section and it’s respective owner section:

    http://docs.python.org/api/threads.html

  3. Anonymous September 14, 2008 at 4:38 pm - Reply

    I recommend taking a look at CMake in the future. It is designed to address the issues described in Lesson 1 and 2 by providing a cross platform tool to generate a configurable build environment. For example, CMake can generate a wide variety of Makefiles or project files for IDEs such as XCode and MS Visual Studio. You can define behavior to include or exclude certain features or debugging based on command line or environment variables, build platform, library availability, etc. CMake can also easily handle the kind of complex build you describe in Lesson 2 requiring multiple projects using the add_subdirectory command.

Leave A Comment

+ 52 = 55