The complexity of generational garbage collection vs. the speed of manual collec...

rwmj · on Dec 27, 2011

Unlikely to happen. Reference counting has poor interaction with the CPU memory hierarchy: It makes all objects larger, making the working set correspondingly larger, meaning your data cache is less effective. It makes code larger because of the extra ref counting twiddling code. Larger code means a less effective instruction cache, as well as being slower because of all the extra operations performed. Reference counts must be frequently modified, and they are scattered all over memory (next to each object), so you get poorer locality and more atomic writes which has all sorts of negative implications for cache-coherent SMP architectures.

So while you may not notice the penalty of reference counting since it is spread out over every operation, it's unlikely that it is better than proper GC for most applications. Maybe just for ones which are extremely sensitive to latency, yet don't mind running much slower overall (hard real time? aeronautics and space?).

As usual you'd need to measure it.

aaronblohowiak · on Dec 28, 2011

Gc pauses suck the "magic" out of interactive programs. The great thing about obj-c is that it I trivial to have a c array of vanilla c structs, so your performance sensitive code is still slim and hot. (size and cache locality.)

My iOS audio programming mixes c for audio processing and obj-c for ui and control quite effectively.

rwmj · on Dec 28, 2011

I didn't say that GC was free ... malloc has overhead too.

For interactive programs, I would recommend two things to avoid (or hide) the pauses. Firstly schedule calls which run the GC when it won't be noticed: between frames in an arcade-style game, or just before accepting user input in a turn-based game. Secondly, size the minor heap large enough so that all computation between the scheduled GC runs can happen without invoking a minor sweep (but not too large that the minor heap is wasting memory -- some experimentation and tuning required).

I've written a couple of interactive games in OCaml that used this strategy and avoided visible GC pauses.

to3m · on Dec 29, 2011

Last I looked, it was a bit hard to follow, but iOS seemed to be storing reference counts separately from the objects. I think you could do (and maybe iOS does?) this on a per-thread basis, to reduce the need for locks.

ScottBurson · on Dec 27, 2011

There's no free lunch.

While GC requires tuning for the best performance, (pure) reference counting imposes a different cost: because it can't collect circular structures, the programmer has to break cycles herself. This clutters the application code. So you can't simply take a program written to run on a GC platform and move it to a reference counting platform unmodified.

Alternatively, you could use both: do most of your reclamation with reference counting, but throw in an occasional GC to collect circular structures. CPython, the most commonly used Python implementation, does this. I doubt it is the best thing for performance, but CPython isn't very fast anyway.

rayiner · on Dec 27, 2011

It really depends on the task. For an embedded device with a UI, where memory is at a premium, reference counting is probably the way to go.

But for pure throughput, especially on a multiprocessor machine, you really want to go with tracing garbage collector. In a multiprocessor system you have to keep caches coherent, and writes to the reference count have the effect of invalidating remote cache lines, increasing coherency traffic, etc.

eru · on Dec 27, 2011

Reference counting is slower than proper garbage collection, since it adds an overhead for each access, instead of just during the collection. (There are ways to make reference counting faster, but they are no longer quite so simple.)

benhoyt · on Dec 27, 2011

It's not quite true to say reference counting adds overhead "for each access" -- for example, you can easily have a loop which accesses an object but doesn't modify the reference count. Reference counting adds overhead each time someone "expresses an ownership interest" in an object (wording from http://developer.apple.com/library/ios/#documentation/genera...).

(That's not to say your main point is false. I don't actually know which is faster in general.)

aardvark179 · on Dec 27, 2011

This is true, but has to be thought through very carefully in multithreaded environments where another thread might remove an object from a collection, causing it to be deleted while you're still looking at it. Safe looping then requires locks or cloning the collection (which is suddenly more expensive because it will hit all the contents' ref counts).

In summary, safe and efficient multithreaded code is never easy.

nradov · on Dec 27, 2011

Reference counting as in Objective C may be a happy medium for now, but that won't last. As the underlying hardware and VMs in mobile devices continue to improve, the need for developers to spend extra time on memory management will be seen as more and more of an unnecessary burden.

aaronblohowiak · on Dec 28, 2011

With ARC, the only extra time is spent avoiding circular data structures, which is quite easy.