That "2 nanoseconds for objc_msgSend" number is misleading for several reasons. ...

That "2 nanoseconds for objc_msgSend" number is misleading for several reasons. For one, that's the timing you'll get for a fully correctly predicted, BTB hit, method cache hit, which does not always describe the real world: Obj-C method cache misses are several orders of magnitude slower. But the bigger problem is that the dynamic nature of Objective-C makes optimization harder. Inlining can result in multi-factor performance increases by converting intraprocedural optimizations to interprocedural ones, not to mention all the actual IPO that a modern compiler for a more static language can easily do. Apple clearly cares about this a lot, as shown by all the work they've done on SIL-level optimizations.

Another way to look at it is: I would probably get similar (or better) numbers by comparing the speed of a JavaScript method call with a warm monomorphic inline cache, with the type test correctly predicted by the CPU branch predictor. Does that mean JS is as fast as C++?