I agree that Stackless is better. Threads are very heavyweight. The problem is that the underlying C interpreter works on only one core. If Stackless can work with multiple interpreters then it may be the greatest thing since sliced bread. I haven't been following it closely enough to say for sure.
All you have to do is build a mechanism that distributes threads between interpreter processes then just spawn an interpreter for each core. For many applications, that's all you need.