As the sibling comment suggests, multi-threading through Cython isn't as smooth ...

As the sibling comment suggests, multi-threading through Cython isn't as smooth as it could be. But, it doesn't seem too bad. I've used it in a rather rudimentary way to accelerate key computations for some NLP models that I've been playing with recently.

As an example in [1], you can check out the function at lines 293-314, which calls the function at 224-256 (or lines 259-291 depending on the machine architecture...). The actual dispatcher that initializes the threads is in [2]. I guess the key points are the use of "nogil" in the function declaration on lines 224-229, and the use of "with nogil" when that function is called on line 312. Note that the code is complicated a bit by my use of BLAS functions accessed via function pointers provided by SciPy.

[1]: http://github.com/Philip-Bachman/NN-Python/blob/master/nlp/C...

[2]: http://github.com/Philip-Bachman/NN-Python/blob/master/nlp/C...