as i've said (https://news.ycombinator.com/item?id=43836353)
try working outside the software industry for someone who needs some software. I had a great time in the 90s working on digital audio gear.
I also had a great time in the 90s and early 00s doing various software and tech gigs for non-software, mom & pop type outfits that just needed a bit of scripting, a web app, some network setup, etc... often found through mailing lists, usenet, user groups, friends of friends.
I've tried to go back to that a few times, but it's actually pretty hard to do these days. After a few decades of trillions of dollars of investment, pretty much every tiny niche has become a company / app with dozens of developers, or available as an online customizable SaaS, or something that can be vibe coded.
The trick is, it's not all in memory - it's a memory-mapped file
If you look at the cache (with `fincore` or similar) you'll see that the buinary search only loads the pages it examines, roughly logarithmetic in the file size.
And a text file is the most useful general format - easy to write, easy to process with standard tools.
I've used this in the past on data sets of hundreds of millions of lines, maybe billions.
It's also true that you could use a memory-mapped indexed file for faster searches - I've used sqlite for this.
what this could really use is a compression format that compresses variable amount of text into fixed-size blocks. with that, it could binary-search compressed text
RocksDB actually does something somewhat similar with its prefix compression. It prefix-compresses texts and then "resets"the prefix compression every N records so it stores a mapping of reset point -> offset so you can skip across compressed records. It's pretty neat
Many years had a customer complaint about undefined data changing value in Fortran 77. It turned out that the compiler never allocated storage for uninitialized variables, so it was aliased to something else.
Compiler was changed to allocate storage for any referenced varibles.
reply