Smuggling Data in Pointers

jbeda · on Nov 14, 2011

I implemented this in the core data structures for IE5. We were very memory constrained as we had to run on Win95 machines with something like 12MB of RAM. I was counting bits.

The base level of the tree wasn't really a node DOM but rather a stream of document events (start element, text, end element, etc). We put these in a binary tree that was balanced as a splay tree. To drive memory usage down I reduced the {parent, left, right} pointers down to {child, next, left_child_bit, next_is_parent_bit}. 'child' pointed to the the first child. I knew if it was the left or the right child based on the left_child_bit. 'next' pointed either to the sibling or the parent, depending on the 'next_is_parent_bit'. However, to really realize the savings, I needed to hide these bits in the pointers themselves.

Other tricks that we used to save memory:

1) Put rare data into a hash table indexed off of the 'this' pointer of the object. Have a bit to indicate if it was there or not. We called these lookaside pointers.

2) Embed objects in other objects to save pointers back and forth. The embedded object would do math on its this pointer (based on bits for which embedded member it is) to get the this pointer of the parent object.

I'm not sure, but I suspect that one of the reasons I think that current IE is slower now than it should be is that a lot of these optimizations are no longer necessary and cause more problems than they solve.

gravitronic · on Nov 14, 2011

Kudos to you for such a widely released implementation of somewhat insane coding.

Crashes back in Windows 95 days suddenly make a lot more sense to me...

jbeda · on Nov 14, 2011

Actually, by the time it shipped, 80% of the crashes were due to insane reentrancy issues due to the componentized nature of IE.

For example, when tearing down a page, we would go ahead and release the laster of the references to the ActiveMovie COM objects. That component was using a separate thread and wanted to shut it down cleanly during clean up. It used windows messaging to communicate to the thread and had to run a message loop. That message loop would also dispatch messages to our top level window. If the wrong message came in at the wrong time in this situation we would have code dealing with a semi-destructed data structure and it would crash. I dealt with a lot of these issues by being very defensive when calling out to anything. Things like lots of null checks, saving references to be released until the end, etc. Of course there was a perf cost for each of these that we monitored closely.

I don't think that any of these were the root of security problems. Those were more due to the impedance mismatches at the level of the shell/browser host and the URL deliver services.

ot · on Nov 14, 2011

This technique is commonly called "tagged pointer"

http://en.wikipedia.org/wiki/Tagged_pointer

They are commonly used in interpreter implementations to distinguish between pointers and integers (if the LSB is 0 then the value is a pointer, otherwise (v >> 1) is an integer value)

sb · on Nov 14, 2011

Just for the record and to provide additional references, the technique of reusing a pointer for integer representation is rather old, for example fixnums refer to the same technique in Lisp implementation, IIRC, in Smalltalk this refers to SmallIntegers.

batterseapower · on Nov 14, 2011

The Glasgow Haskell Compiler uses tagged pointers to implement lazy evaluation. Suspended computations are created as heap-allocated "thunks", and the runtime will tag any pointers to a thunk that is known to have already been evaluated, which can save one indirection when you need to "force" the thunk to get a value.

Furthermore, for thunks of algebraic data types with only a few alternatives (e.g. booleans, the Maybe type) the tag will encode which of the alternatives the pointer actually points to. So if you are case scrutinising a boolean you can decide which branch to jump to by inspecting the pointer only with no memory access.

This last trick doesn't work if the number of alternatives is > 3 (for 2 byte alignment) because there aren't enough bits available.

xenomachina · on Nov 14, 2011

Some software for the Motorola 68000 did something a bit like this (though probably more for space savings/convenience than atomicity): the 68k had 32-bit pointers but the top 8 bits were ignored so some programs would store extra data there. You didn't even need to mask off the bits to dereference! I remember the Amiga ROM Kernel Reference Manuals actually warned against doing this, as later CPUs in the 68k family did pay attention to the top byte in a pointer.

algolicious · on Nov 14, 2011

Andy Hertzfeld used this same trick on the original Macintosh, in a move he called his "worst blunder": http://folklore.org/StoryView.py?project=Macintosh&story...

yason · on Nov 14, 2011

Surprisingly, the original AmigaBASIC written by Microsoft (yes) used this very technique and thus failed catastrophically on later AmigaOS versions where the memory bus was no longer 24 bits.

Further, on the Amiga, any sensible pointers were generally 4-aligned anyway (you couldn't read words or longs from odd addresses and I think reading longs from 2-aligned addresses was slower, too, so nobody wanted to do that) so, theoretically, if you were to abuse this to the maximum effect you could've stashed 10 bits of excess payload data into an address: 8 bits in the bits 23-31 and 2 bits in 0-1 of an address. Any pointer was internally just a 32-bit address register value so you certainly had bits to spare!

xenomachina · on Nov 14, 2011

Ha! I didn't know AmigaBASIC did that.

I almost mentioned the fact that pointers were pretty much always aligned on the 68k, but I never really head of anyone using those bits to store anything. The top-8 had the "nice" property that the hardware did the masking for you.

I do remember that certain parts of the Amiga OS (particularly in the file system, I believe) used "BPTRs" which were essentially pointers shifted down 2 bits -- in other words, the word index instead of the individual byte index. Normal pointers were called "APTRs" by the code that used BPTRs.

yason · on Nov 16, 2011

Yes, the APTR / BPTR divergence had something to do with AmigaDOS (the disk operating subsystem) was originally written with BCPL. While it was later rewritten in C, however, the ABI had to remain compatible and thus the pointer-mangling continued. I don't remember if they had any use for the extra two bits or whether the addressing scheme was coined particularly just to access a 4x large address space.

regularfry · on Nov 14, 2011

This is precisely how Ruby's object ids work. An object id is an 8-bit aligned pointer to the object's location in memory. The low 3 bits are used to identify booleans, fixnums, nil and symbols.

I'm sure this was borrowed from a lisp implementation.

arethuza · on Nov 14, 2011

Lisp machines typically had direct hardware support for these tags - Symbolics machines used 36 bit words, 32 bits of data and 4 bits of tags:

http://en.wikipedia.org/wiki/Symbolics

dfox · on Nov 14, 2011

Also, SPARC has direct hardware support for this (for the cases that are relevant for JITed code, ie. arithmetics). But sadly, on current implementations it is often somehow emulated and thus slow.

rayiner · on Nov 14, 2011

It's because there isn't much point. Most Lisps use a tagging scheme like:

00 = Integer 01 = Pointer

With 00 as the integer tag, integer addition and subtraction can be done directly without masking off the tags. Multiplication and division need the tags masked, but are slow anyway. The tag can also be stripped off for free on any architecture that offers BASE + OFFSET addressing (just subtract 1 from the desired constant offset).

_delirium · on Nov 14, 2011

Apparently inspired by Lisp as well; this old '91 Usenet post by a Lucid employee claims that SPARC added the tagged-arithmetic support specifically for Lucid Common Lisp: http://compilers.iecc.com/comparch/article/91-04-082

kaushiks · on Nov 14, 2011

This is also a common technique used by modern Javascript implementations. A much less common and slightly more interesting approach (to me) is NaN boxing - that packs tags into unused NaN bit ranges of doubles that are encoded per the IEEE-754 format.

Chakra uses the bottom most bit to distinguish SMIs from pointers.

ralph · on Nov 15, 2011

LuaJIT uses NaN-tagging too.

apaprocki · on Nov 14, 2011

While this tagged pointer construct is pretty common but makes some people wretch, realize that the entire web is built upon this same concept, only bigger. The nan/nun-boxing that is used by various dynamic engines (e.g. All JS engines) relies on the fact that pointers on x86-64 hardware only use 48-bits of memory address space. They use the remaining 16 bits in the double for flags pertaining to what type of object it is. This has a nasty side effect of preventing these engines from running on any 64-bit architecture that does not have this memory address space restriction (e.g. Sparc, POWER). One either has to force the environment to only allocate space out of the lower 48-bits or come up with some other kind of encoding scheme to indicate if the address is a "far" address and should be added to the 48-bit base.

agazso · on Nov 14, 2011

In the Linux kernel, the internal Red-Black tree (rbtree) implementation uses this trick to save memory.

Quoting LWN:

"The anticipatory, deadline, and CFQ I/O schedulers all employ rbtrees to track requests; the packet CD/DVD driver does the same. The high-resolution timer code uses an rbtree to organize outstanding timer requests. The ext3 filesystem tracks directory entries in a red-black tree. Virtual memory areas (VMAs) are tracked with red-black trees, as are epoll file descriptors, cryptographic keys, and network packets in the "hierarchical token bucket" scheduler."

http://lwn.net/Articles/184495/

rvkennedy · on Nov 14, 2011

If you're serious about smuggling data in pointers, there's probably a lot of mileage on the other end of the pointer if you're writing 64-bit code. While 32 bits can address 4Gb, you only need another 8 bits (say) to get a terabyte, leaving maybe 24 bits to cram with whatever you feel like.

olliej · on Nov 14, 2011

Using tags in the high bits is exactly what caused all those problems for software written in the 80s and early 90s. The assumption that your software will have been updated before anyone has such "ludicrous" amounts of memory is not safe. Low bits seem to be generally safe as a) you can easily control pointer alignment in your own code, and b) most system's alignment requirements are increasing not decreasing. Some architectures can't even access unaligned pointers.

Even excluding large ram increases, presumably the current (effective) 48bit address space limitation on x86_64 is likely to go away, and then ASLR is going to make it hard to guarantee that your high bits are clear.

nitrogen · on Nov 14, 2011

It's probably not a good idea to rely on this. Though x86-64 is (currently?) limited to a 48-bit virtual address space, it's designed so that the upper 17 bits must be all 0 or all 1, and those bits may be used in the future.

https://secure.wikimedia.org/wikipedia/en/wiki/X86-64#Virtua...

rvkennedy · on Nov 14, 2011

A good idea? I think it's nuts! I strongly recommend that nobody ever do this.

vidarh · on Nov 14, 2011

It'd be ok to do for a JIT, assuming the JIT knows the systems it targets can deal with it and/or has a fallback strategy for systems that don't.

But yes, not for ahead of time compiled code...

dubya · on Nov 14, 2011

The enormity of 64-bit pointers is something Don Knuth has complained about because it wastes a lot of memory for programs that don't need that much address space. And it actually matters for some of the stuff he writes. The old Motorola had an addressing mode that used 16 bit relative pointers. Does Intel have anything similar for x64?

dubya · on Nov 14, 2011

Replying to myself :( The mode was called PC relative addressing, and resulted in smaller, faster code on the 68000, but things had to be written in 32k chunks. I seem to recall that it was required for the original Macintosh before they wrote a proper program loader. Anyhow, some of Knuth's code (hiss BDD package at least) has hacks to reduce the size of pointers.

BruceJillis · on Nov 14, 2011

Came here to make a remark about Knuth. Indeed, in TACP vol 1, fascicle 1 (the new MMIX design) he talks about being able to address memory in chunks larger than 8 bits and specifically remarks about the unused bits: "we will find precious use for them later". :)

dmpk2k · on Nov 14, 2011

The JVM takes advantage of this: http://wikis.sun.com/display/HotSpotInternals/CompressedOops

derleth · on Nov 14, 2011

Macintosh programmers used to do this. Used to.

http://en.wikipedia.org/wiki/Mac_OS_memory_management#32-bit...

exDM69 · on Nov 14, 2011

This is a really dirty hack to do and it only works if your addresses are always aligned to a 2^n byte boundary. That said, there are plenty of cases where this dirty hack is perfect for the job.

Atomic lock free data structures is one example. Tagging the pointers can be used notify that a node is going to be deleted in order to avoid the ABA problem (as mentioned in the article). Tagging pointers is absolutely essential for more complex lock free data structures, like doubly linked lists.

This can also be used in a garbage collector to add a few bits of metadata. In a GC you can (and probably should) decide to use memory objects with proper alignment, which leaves you with a few bits to store data in. A perfect solution to implement something like Lisp cons cells.

anon1385 · on Nov 14, 2011

OS X Lion uses tagged pointers for some CF objects: http://objectivistc.tumblr.com/post/7872364181/tagged-pointe...

Search for OBJC_IS_TAGGED_PTR in http://opensource.apple.com/source/objc4/objc4-493.9/runtime...