Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article doesn’t say multi-CPU scaling isn’t necessary. It says that threads are usually the wrong answer anyway.

There are great process based ways to scale out – look no further than Erlang to see it’s true.



I use Python for my day job. I've experimented extensively with Java and Scala. I must say this: Threads are AWESOME. Threading is the most flexible model of concurrency because you can can build programs that EFFICIENTLY implement "alternative concurrency models" like message parsing and STM with threads. Threading is supported natively by every OS. And fast.

Erlang is hyped as the ideal model for concurrency, but in practice is a niche product that's primarily useful for programs that are almost pure IO - chat servers, routing components like proxy servers and packet switches.

The Erlang model does NOT apply to python, anyway, since Python processes are nothing like Erlang processes. Unlike Erlang processes, Python processes are very heavyweight and message passing between them is costly.


If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

The big difference is that processes are much more robust and testable. The cases where threads are really needed are fringe cases and – while it’s a pity – Python doesn’t seem the right language if you don’t want to go the Jython/IronPython way.

The bigger problem is that people are used to go for threads by default although only few are able to write bug-free threaded code. It’s obsolete but prevalent performance wisdom and the fact that threads were really popular in the Java world.


This is a really short-sighted perspective.

Proper support of threading inherently allows more performance and flexibility than multiprocessing. On top of them, you can build powerful, Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map and STM, and on top of those, even more powerful abstractions that help the developer avoid concurrency bugs.

I'm really excited for PyPy. That is a project full of people who are not afraid to quickly iterate on powerful ideas that can make Python a high performance language that it deserves to be, instead of resorting to calling MT programming a "fringe case" and ad hominem attacks.


I wish you’d read the article before making your accusations.


I have. There's an ad hominem attack in the middle of it. Otherwise it's a great article.

I understand that everyone here is acting in good faith and wants Python to be better, and the article otherwise contains lots of great information presented in a reasonable manner. You bring up lots of good points too. But other statements like the ones I mentioned are overly broad or brash.


I think it’s just attrition by explaining the GIL problem over and over again. Nick is one of the major core developers and probably just fed up by the topic. So this bit of snark is all the pay he’ll ever get for his work on CPython.


I really liked the article. But look at how much of this thread is spent talking about that bit of snark. I think that mixing the snark in with all of the rational reasons for Python 3's existence and not working on the GIL made some people less receptive to rational arguments. In other words, I don't think it was worth it.


"Pythonic abstractions like concurrent.futures.ThreadPoolExecutor.map"

Although it might be great stuff, the word 'Pythonic' is pretty funny next to a long Java-style name like that


If you’re using python, the performance gap between processes with message passing and threads with locking is the last of your problems, believe me.

What would be the first of my problems?


Not sure if you’re trying to troll me by taking it out of the obvious performance context but anyway: The performance penalty due to the usage of a un-JIT-ed scripting language?


Not trolling you, just wondering what you thought was more important from a performance perspective. Re-writing working software in another language is not always feasible due to real-world time constraints, and in that context the performance difference between message-passing and threads would be the first of my problems. Basically, I just don't see the argument of "use a more appropriate language than Python" as a useful counter to criticism of the GIL. The whole point of criticizing Python (in my case anyway), is to hopefully nudge the language to suiting my needs more closely.


The counter is not (even while some people try to turn it that was): THREADS SUCK, WE WON'T ADD THEM BECAUSE WE DON'T LIKE THEM. it is: given the circumstances (which Nick outlines verbosely) a removal of the GIL is not pragmatic.

And this is the last time I’ve wrote this, I feel like a street-organ. >:(

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.


I'm not intentionally trying to make you repeat yourself. And I know that removal of the GIL is not pragmatic at the moment (or maybe ever), but that doesn't mean it wouldn't be valuable. The GIL wasn't much of a problem ten years ago because not many personal computers had multiple cores. Today it's become a bit of a pain for me personally, and it will only become more painful as core count increases while single-core performance remains largely stagnant.

And all I was saying in this thread is that the performance gap between threads and processes isn’t that big of a deal, if you run non-native code anyway. The multiprocessing module is pretty cool.

This is a line that I hear over and over again, but I strongly disagree with. It's not always easy to predict where your performance bottlenecks will be until you actually start implementing it in some language. If I've chosen Python for a project and find I need more cores, I'm stuck with either re-implementing critical sections of code in C extensions or other languages, or using multiprocessing. And multiprocessing is not that great because it splits the memory space across processes and communication between them is extremely expensive. And there are many caveats to this which cause enormous headaches (eg., you can't fork your process while having an active CUDA context, not all Python objects are serializable, pickling is slow, marshaling doesn't work well for all data types, you must finish dequeuing large objects from a multiprocessing.Queue before joining the source process, etc.).

Yes, I could get a 10-100x speedup by re-writing everything in C. But most of the time, I would be very happy with a 6-12x performance gain from just using threads in a shared memory space.


I assume that for whatever reason, it is absolutely impossible for you to use any other concurrency model.

Did you try Jython?


The specific project I'm vaguely referring to here is described in a little more detail as 1) here: http://news.ycombinator.com/item?id=4178070

No, I didn't try Jython. The choice of CPython was made before I took over the project, and there are also a dozen or so dependencies which I don't think are compatible with Jython.


If you are trying to exploit parallelism, any kind of data sharing (other than pure read-only sharing) costs you.


Yes, but with threading, the costs are minimized. You don't have to convert your Python objects to bytes, send them over a network, wait, retrieve bytes, and convert back to Python objects every time you need to read or write some shared data.


SMP Erlang DOES use threads, it then uses green-processes and a custom scheduler to schedule processes fairly and preemptively across those threads. It doesn't use like 1 thread per process, but threads are a key component of how it works.

It is absolutely nothing like spawning OS level processes. They are micro-processes, green-processes than live inside the Erlang VM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: