More

tybit · on Jan 13, 2023

Tangential to the authors point, but it’s funny to note many new SQL databases(e.g CockroachDB, TiDB, MyRocks) are written on top of RocksDB, a “NoSQL” key value store.

jakewins · on Jan 13, 2023

I mean, so are the relational databases. Rocks provides similar storage primitives - trees - as most databases implement internally.

Postgres is “just” a SQL engine running on top of a key-value heap and trees pointing into it.

Neo4j is the same, key-value record store plus trees.

Rocks and it’s peers have made that into a dependable library, making new db dev that much faster :)

mattashii · on Jan 13, 2023

> Postgres is “just” a SQL engine running on top of a key-value heap and trees pointing into it.

I wouldn't call heaps a key-value data structure; even if you could argue that the location of your data is an implicit key. And the trees are strictly optional - you could build your database with only hash and brin indexes.

jakewins · on Jan 13, 2023

Yeah agree, I had the same thought after I wrote that comment “wait dumbass, you don’t get to pick the key in the heap, it’s not the same”.

But still, you get what I’m getting at: in the end, the relational dbs, internally, build on top of abstractions that are things like trees, key-value structures, heaps, logs etc.

zffr · on Jan 13, 2023

It’s my understanding that row-based relational databases are basically key-value stores that map from row ID to column values. The “magic” of SQL-based relational databases is how the KVS is queried, and the consistency guarantees they provide.

Part of the consistency guarantees is having a reliable storage engine. That’s the value RocksDB provides.

The rest of the “SQL stuff” can be built on top this.

yourMadness · on Jan 13, 2023

It seems fairly well shown that the "SQL stuff" can be build on top of it on the server side.

Building the "SQL stuff" on the client side seems less well proven to me.

tybit · on Nov 27, 2022

When it comes to cache invalidation worse performance isn’t the primary concern in most cases, correctness is.

tybit · on Nov 25, 2022

It’s usually a safe bet that the state will preference companies over both employees and taxation unfortunately.

tybit · on May 10, 2022

I think this architecture would be really powerful paired with the actor model to shard databases to nodes.

tybit · on March 15, 2022

At big tech companies I’ve seen and heard about, the answer is crypto shredding. Encrypt all PII at rest with a per user data key. GDPR deletion requests can then delete the data key. This isn’t perfect, but it’s a step in the right direction IMO. Unfortunately I don’t see it being feasible for a typical company anytime soon.

salawat · on March 15, 2022

Stlll keeps foreign keys and the key management can be a nightmare. Basically, you're talking per customer encryption keys... Even then, you still might get something if you have enough other data to cross-ref/compare against/you're just looking for something to confirm/parallel construct from.

tybit · on Jan 20, 2022

For anyone else expecting this to be a paper given the domain name, it’s not. It’s a non technical interview with a couple of the original papers authors. Not bad, just not as exciting as I imagine a paper detailing what they’ve learnt from a distributed systems perspective etc operating Dynamo then DynamoDB for so long now.

mjb · on Jan 20, 2022

We don't have a paper on DynamoDB's internals (yet?), but here's a talk you might find interesting from one of the folks who built and ran DDB for a long time: https://www.youtube.com/watch?v=yvBR71D0nAQ

And Doug Terry talking through the details of how DynamoDB's transaction protocol works: https://www.usenix.org/conference/fast19/presentation/terry

If we did publish more about the internals of DDB, what would you be looking to learn? Architecture? Operational experience? Developer experience? There's a lot of material we could share, and it's useful to hear where people would like us to focus.

pow_pp_-1_v · on Jan 20, 2022

All of it - architecture, operational experience, best practices etc.

ldrndll · on Jan 20, 2022

Just want to second this. All of the above sounds really interesting to me!

uvdn7 · on Jan 20, 2022

https://brooker.co.za/blog/2022/01/19/predictability.html This might be something you are looking for.

tybit · on Jan 18, 2022

This would be an interesting article to flesh out. I.e is there evidence that MySQL is more reliable in those ways?

I always prefer reliability over features even though I’m a product engineer so if he’s right it’d be good to know. Either way, I’m stuck with the MySQL that the infrastructure engineers at work have provided us with.

jpgvm · on Jan 18, 2022

It's not more reliable on those dimensions but it is "easier". For most cases you can enable statement based replication in MySQL and put it into semi-sync mode and it will mostly do what you want.

PostgreSQL on the other hand is less of a fully finished database w.r.t replication and more like a set of incredibly powerful tools that let you build the scale out, partitioned, replicated, CDC friendly/etc database cluster of your dreams. You can mix and match nodes setup with physical replication vs logical replication, you can use partitioning and table inheritance to increase scalability and hide the fact the database is actually distributed across many nodes/clusters from your application, etc.

I think there is probably some lag/delta here where PostgreSQL doesn't have an easy enough/simple enough story for basic replication modes outside of basic physical replication.

tybit · on Jan 6, 2022

I think there’s a good argument that async is decent for performance critical languages, e.g C++ and Rust, and for languages looking to model effects, e.g Haskell and arguably Rust. I don’t see a good reason for it in mainstream languages like Java, JavaScript and C#.

I think Java’s approach with Loom is going to be a big win over C# there, as someone that just wants to get stuff done and is a fan of both.

merb · on Jan 6, 2022

I'm not sure if Loom will be successful. It does not fit the ecosystem. Thus a lot of java async stuff needs a lot of rewriting, this can either disrupt the ecosystem or split it. at least on the library level.

Matthias247 · on Jan 7, 2022

I actually feel it would be successful because it exactly fits the ecosystem. A lot of Java code is classical threaded code. E.g. the majority of Java servlet code, and older web frameworks. Those would all immediately benefit from Loom in terms of resource utilization and scalability.

Of course Java also has some frameworks like Netty and things built on top of it - which won't benefit. But I feel like even though those are great from a performance point of view, they are actually more niche in the overall java world.

tybit · on Dec 26, 2021

I realise that the Twitter is using Mesos, but for those of us on Kubernetes does guaranteed QoS solve this? https://kubernetes.io/docs/tasks/configure-pod-container/qua...

bboreham · on Dec 26, 2021

If you also use the CPU Manager feature and request an integer number of cores, yes. Then for example if you request 3 cores your process will be pinned onto 3 specific cores and nothing else will be scheduled onto those cores, and CFS will not throttle your process.

https://kubernetes.io/docs/tasks/administer-cluster/cpu-mana...

mac-chaffee · on Dec 26, 2021

QoS classes are only used "to make decisions about scheduling and evicting Pods." It still uses the Completely Fair Scheduler, which is where the problem came from (as far as I understand).

KptMarchewa · on Dec 26, 2021

I think they are not using Mesos now.

https://dzone.com/articles/what-can-we-learn-from-twitters-m...

tybit · on Dec 11, 2021

It was very clear from their post that they were criticising STS from the perspective of an engineer in AWS within a different team.

hericium · on Dec 11, 2021

I assumed in good faith that this is someone knowing internals as a larger customer, not an AWS person shit-talking other AWS teams.

Got curious only after a downvote hence late edit. My bad.

ignoramous · on Dec 11, 2021

> ...an AWS person shit-talking other AWS teams [in public].

I remember a time when this would be an instant reprimand... Either amzn engs are bolder these days, or amzn hr is trying really hard for amzn to be "world's best employer", or both.

filoleg · on Dec 11, 2021

Gotta deanonymize the user to reprimand them. Maybe i am wrong here, but i don’t see it as something an Amazon HR employee would actually waste their time on (exceptions apply for confidential info leaks and other blatantly illegal stuff, of course). Especially given that it might as well be impossible, unless the user incriminated themselves with identifiable info.

WaxProlix · on Dec 11, 2021

It's true that I shouldn't have posted it, was mostly just in a grumpy mood. It's still considered very bad form. I'm not actually there anymore, but the idea stands.