I have built one of these map rendering systems on EC2. We decided to go with a 100% on-the-fly + two caching layers (one HTTP on top and one application-specific in between [TileCache]). There were dozens of layers (now there are thousands) so pre-rendering all of them was not feasible. Since I left that gig, the current team has added some processes to "warm" the caches before new data goes live. It just takes a few minutes though, nowhere near 2% of the tilespace.
From what I remember, the biggest performance trick was figuring out how to properly pack worldwide street data (geometry, categorization, and label) + indexes into RAM on a single machine without using a custom file format. It involved stripping out every little unnecessary byte and sorting all of the streets geographically to improve access locality. I believe this got down to ~15GB of shapefiles.
This comment will probably get buried and no-one cares, but...
OSM, although a great force for good in the mapping world, is not in its current state a viable competitor to Google Maps and other paid datasets.
You need to start driving around, using LIDAR, and create a massive parcel / address database to really do it right.
You need photos, from the street and from the air. For OSM to really fly we need to see open source hardware combining photography, GPS, LIDAR, that folks can strap to their cars or fly from RC planes or balloons or kites.
Geocoding needs to actually work, worldwide. That's incredibly hard. Its so much more than the visual maps.
Just pointing this all out, since everyone seems to gloss over the fact that Google Maps has a massive monopoly in this area right now.
So in other words, OSM has done a pretty good job of mapping the _streets_ of the world, and now it's time to shift the focus to recording the individual _address points_ along those streets?
AFAIK, the tags to use for individual address points have been agreed upon, and in some parts of the world (e.g. Germany, where the maps are essentially finished) this address point data is already useful. Is this, in fact, the case?
As with everything in OSM, it depends on which country you are talking about. For example, I believe in (at least some areas of) France the outline of every building has been imported from a high quality government source with address data yet the roads still have to be drawn in from satellite photos or GPS traces.
The US is the home base of Google (amongst other big name tech companies) but it's OSM data is amongst the worst. This is ironic as much of the open data used to map the rest of the world was provided by the US government e.g. NASA radar topography and Landsat photos.
But for the lower level data the best source (often the definitive source e.g. for administrative boundaries that aren't physically present on the ground) is government data, so the quantity and quality of data varies as you cross state (and sometimes county) lines.
Interesting stuff. This is essentially a coding/compression problem. My advisor at UCLA helped pioneer the computer vision analoge for this. His work tackled encoding textures (or areas of little information) with markov random fields and areas of high information (edges, features) with graphs.
Not sure if I understand why real-time tile rendering on servers doesn't work.
Googly clearly does not pre-render tiles and it looks like it works fine for them. Request is made, data collected, relevant tiles rendered, returned to client-side. Yes, I know, Google has $billions in computing resources, but does it really take that much server-power to render tiles? (even for 1,000s of requests/second?)
Is it a matter of data transfer or processing capacity? A screen sized 2048x1536 would need to load 48 tiles at a time. Google's tiles for a city avg about 14KB/tile, so 672KB. 5,000 of these a second is 3.4GB. (I'm a front-end guy so this is a little out of my league.)
The problem, generally, is that rendering the tiles is computationally expensive. You have to wade through each feature (thing that could be represented in the tile) and decide if it intersects the tile, then decide if/how it will be rendered, then render and composite it with every other visible feature.
Doing all of that work quickly is possible, it just isn't simple. Also, for most people the data involved changes relatively infrequently. Why have the server render the same tiles repeatedly when you could just cache the result and reduce it to a simple file hosting problem?
Also, your 2048x1536 screen may likely be loading more than 48 tiles. It's common to request tiles around the current viewport and (less common) above/below the current zoom to ensure they're present before they're needed. To see this in action, see how fast/far you have to scroll a google map before you can "catch it" with tiles that aren't loaded.
Google pre-renders every map tile for every zoom level. When a request comes in all they have to do is combine it with a dynamic image header (the color pallet is generated on the fly to support custom map styles) and ship it out.
Hmm, I'm curious about how much this is overstating the effectiveness of the optimizations in order to teach about them. With this approach, (it seems like) you would still have to render the highest zoom level first, which already takes 3/4 the render time anyway. There are lots of other optimizations you can do (and they probably are doing) there, but they aren't related to the tree/reference-based ones mentioned here.
The presentation also seems to overstate the redundancy found in the land tiles. You would get savings from water tiles at all zoom levels, which would be enormous, but (looking at http://mapbox.com/maps) even if humans only cover 1% of the land, our infrastructure is well enough distributed that it and inland water and other details they've included would preclude redundancy at all but the highest zoom levels (although, in this case, the highest zoom level taking up 3/4 of the tiles saves the most).
With that in mind, I'm wondering about the claimed rendering time of 4 days. That fits nicely with the story told, but with the 32 render servers mentioned at the end, that would seem to be 128 CPU days (though I'm not sure about the previous infrastructure they were comparing it to), which is actually close to the count mentioned early on with a super-optimized query and render process. This is all just supposition, so I don't want to sound too sure of myself, but the storage savings seems to be the big win here (60% from water + redundancy at highest zoom levels), while I would guess that you would save considerably less in processing (15% from water + minor redundancy on land (absent other optimizations e.g. run-length-based ones)).
Try zooming in on Russia, Alaska, Brazil, or Egypt and you find a lot of empty tiles. Still, if you can render the whole thing in 4 days I suspect caching what people actually look for would be good enough as people probably hit max zoom on manhattan 1000x as much as they do on some random village in the amazon. The advantage being you can just invalidate tiles as you get new information vs trying to render the whole thing every time you get new street data for Ohio.
True, and countries that are empty except for random river squiggles can save a good bit of storage (at the highest zooms so those river squiggles are isolated), but, again, if you have to start with the lowest level first, the described approach isn't saving a whole lot in processing time, even with big empty space.
One thing I forgot to say in my top post, though, is the presentation mentions compositing layers together on the fly, and that is one of the key tools they use, but the presentation drops that thread. I'm curious if originally it had more on that front and what they do with that.
Regarding starting from the highest zoom level, I believe it's the other way around. If, for example, at z2 a tile is all water, then it is also all water at all higher zoom levels, which can then be skipped.
My guess is, even client side, vector rendering (I assume you're talking SVG or something similar) is rather CPU intensive, and definitely doesn't do well on mobile devices. In a crowded area like NYC, you'd get slowed to a halt. Mobile would likely be unusable. So maybe it's a user experience trade-off.
The author writes, "I found myself wishing Word had a simple, built-in button for '"cut it out and never again do that thing you just did'". Then I look up at the Clippy image and it has a check box with "Don't show me this tip again". Hmmmm.
That right brain / left lane combo strikes me as just about the fundamental quality to look for in a startup founder or team.
Nice work.