More

makmanalp · 2025-11-09T23:45:05 1762731905

I'm working on a video / post on how to solve the 1 billion row challenge (https://github.com/gunnarmorling/1brc) and get a competitively fast result while keeping the code readable and maintainable.

So far I'm within spitting distance of the winning entries without using any unsafe code or bit twiddling tricks or custom JVMs or anything like that, and having all the concerns nicely separated and modularized.

Excited to share soon!

makmanalp · 2025-06-04T15:16:52 1749050212

Counterthoughts: a) These skills fit on a double sided sheet of paper (e.g. the claude code best practices doc) and b) what these skills are has been changing so rapidly that even the best practices docs fall out of date super quick.

For example, managing the context window has become less of a problem with increased context windows in newer models and tools like the auto-resummarization / context window refresh in claude code make it so that you might be just fine without doing anything yourself.

All this to say that the idea that you're left significantly behind if you aren't training yourself on this feels bogus (I say this as a person who /does/ use these tools daily). It should take any programmer not more than a few hours to learn these skills from scratch, with the help of a doc, meaning any employee you hire should be able to pick these up no problem. I'm not sure it makes sense as a hiring filter. Perhaps in the future this will change. But right now these tools are built more like user friendly appliances - more like a cellphone or a toaster than a technology to wrap your head around, like a compiler or a database.

makmanalp · 2025-05-22T01:30:53 1747877453

In some contexts, dictionary encoding (which is what you're suggesting, approximately) can actually work great. For example common values or null values (which is a common type of common value). It's just less efficient to try to do it with /every/ block. You have to make it "worth it", which is a factor of the frequency of occurrence of the value. Shorter values give you a worse compression ratio on one hand, but on the other hand it's often likelier that you'll find it in the data so it makes up for it, to a point.

There are other similar lightweight encoding schemes like RLE and delta and frame of reference encoding which all are good for different data distributions.

makmanalp · 2025-05-08T00:16:15 1746663375

This is my time to shine - I know the cause of this mistake. Like the article mentions, international trade is specified using the HS (Harmonized System) encoding mechanism.

Now, product groups for which data is most frequently and easily available is the 4-digit level, which is quite broad. If you look at the code 3002 in the HS classification system (of which there are many versions but we'll ignore that for now), you'll find a category, succinctly named:

> "Human blood; animal blood prepared for therapeutic, prophylactic or diagnostic uses; antisera, other blood fractions and immunological products, whether or not modified or obtained by means of biotechnological processes; vaccines, toxins, cultures of micro-organisms (excluding yeasts) and similar products; cell cultures, whether or not modified:"

https://hts.usitc.gov/search?query=3002

People new to trade data, especially programmers, with some hubris, tend to think this is way too long a category name to fit in a title or dropbox, so they chop it at the semicolon and call it good, resulting in "Human Blood" or similar. Better data sources tend to shorten these based on the real world percentage of the subcategories, e.g. see here "Serums and vaccines":

https://atlas.hks.harvard.edu/explore/treemap?exporter=count...

If you search for 3002 (Serums and Vaccines) in the US's exports in 2023 you'll see the figure 1.58%:

https://atlas.hks.harvard.edu/explore/treemap?exporter=count...

Which seems to me to be how they arrived at that incorrect number - some other website showing comtrade / us trade data with bad category names.

Lesson here: classification systems are hard.

makmanalp · 2025-02-19T10:44:02 1739961842

God I loved that lobby and the art deco + mexican combination art style. I found a high res version of that mural as a wallpaper at some point but am coming up short for the link right now.

makmanalp · on Dec 20, 2024

> almost 25 billion kilometers

Had to reread that a few times to make sure

makmanalp · on Dec 16, 2024

It's more that the "negative nancies" became necessary nancies. Back when Amazon sold books, they became a considerable player but otherwise big whoop. Now they threaten to dominate logistics AND hosting, and are expanding their grip and stamping out competition in other markets. Google is pretty much synonymous with the web. Meta owns a big chunk of messaging and social media. Computers used to not matter much but now we're glued to one

It costs even more to be reckless today.

Re: "whitey on the moon" - I'm not sure the space program would be my first target there but I think it makes a more poetic contrast and forces people to pay attention by targeting a beloved cultural narrative. Cyberpunk - by my reckoning a bit later - has been preaching a very similar message of massive inequality in the presence of incredible technology and wealth disparity and power concentration. And yet that doesn't draw the same ire. I guess in that case it's easier to dismiss the core message because robot limbs and cool neon lights are too much of a distraction.

makmanalp · on Dec 5, 2024

You might enjoy the excellent Articles of Interest podcast, an episode of which covers this exact phenomenon, but there are many other great episodes about similar subjects in clothing and fashion

https://99percentinvisible.org/episode/suits-articles-of-int...

makmanalp · on Nov 18, 2024

It's not apples to apples of course but that's well within the ballpark of what spacex spends mostly just shooting stuff into orbit in a year: https://spacenews.com/spacex-and-the-categorical-imperative-...

Especially for a first time in all of humanity type of mission, half a century ago, which yielded brand new data on faraway objects we'd never had, and considering it's still going and reporting data, it's arguably a bargain basement price for such a thing.

bdcravens · on Nov 19, 2024

I 100% agree on the value of it.

My point was the comparisons that are often made, to things like consumer electronics, really aren't apt.

makmanalp · on Nov 15, 2024

This kind of post is what brings me back to this website :-)

I'm the guy with the enthusiastic thread earlier on in this post. I'd love to sit down and chat with you for an hour on zoom and hear all about those times, which we could then we could post the video on here - I think people would appreciate.

I have absolutely zero experience in interviewing people, nor do I have a media channel of any kind, but I promise I'd do my best to ask interesting questions. If that sounds interesting, shoot me an email (you can find it in my profile).

Stratoscope · on Nov 20, 2024

Thank you! That sounds like fun. I don't have much experience being interviewed, so that makes us even.

I will drop you a note by email and we can discuss. :-)