Ah interesting. Is your keyword-document map (aka term dict) too big to keep in memory permanently? My understanding is that at Google they just keep it in memory on every replica.
Edit: I should specify they shard the corpus by document so there isn't a replica with the entire term dict on it.
Could plausibly fit in RAM, is only like ~100 GB in total. We'll see, will probably keep it mmap:ed at first to see what happens. It isn't the target of very many queries (relatively speaking) at any rate so either way is probably fine.
No I mean for every query there is mapping up keywords to trees of documents, there is dozens if not hundreds of queries in the latter in order to intersect document lists.
Edit: I should specify they shard the corpus by document so there isn't a replica with the entire term dict on it.