Yeah I think you’re correct: Git’s internal data structure is a grow-only set CRDT (well, graph) of commit objects. Grow only sets of hashed contents are one of the simplest CRDTs out there - the network just implements set union.
If git only had commit and fetch, it would be a crdt (though not a very useful one). But git also needs to merge commits / branches together. And it does that via diff-match-patch, which doesn’t pass the crdt rules. (Since it’s not idempotent, associative, automatic and in many cases deterministic).
The content-addressed database for commit objects (but not their names): possibly, yes. It has trivial merge semantics, just keep everything and deduplicate.
But that's equally true of every content-addressed system. And entirely untrue for every named object in git, which are a critical component of git being git, instead of an opaque blobstore with zero semantics.
Yes I agree. I still think its wrong to call Git a CRDT because git is a lot more than its content-addressable system. All that other stuff on top - you know, the parts you use to track and merge branches? That stuff isn't a CRDT.
Maybe its like asking if Wikipedia is a relational database. I assume wikipedia is implemented on top of a relational database. But the resulting wikipedia website? No, not a relational database.
Thank you, I appreciate you taking the time to agree and offer additional nuance. I agree with your nuance. That is the nuance that makes Git not a CRDT.
I can also appreciate academics need a time and place to be academic about their definitions (e.g. papers, and in-depth answers).
> What are some real world apps using CRDTs that have really good experiences?
I also feel, academics could do more to appreciate questions like this one are non-academic. The question above could just as easily be interpreted by many as "I'm building a multi-player application. I want it to be a really awesome experience but I don't know if I'll get the concurrency right. I've heard concurrency is hard. I've heard good things about CRDTs tho, should I be using them?".
To which we should absolutely be mentioning truly pure CRDT based applications/frameworks, but not mentioning similar but nuancedly different applications/frameworks, hurts 1. any less educated readers' ability to instinctively understand what a CRDT is; and 2. how it might help them create the next great multi-player application. It also hurts someone that has spent the last N nights trying desperately to make their pure CRDT application/framework fun and valuable but its just not happening. Allowing them to loosen up and be a bit more creative, helping to empower their application/framework because the universe of "asymptotically approaching a CRDT; but will definitely not have concurrency bugs/issues" (which doesn't have a nice academic name + explicit ruleset yet) is so much larger "than strictly a CRDT".
Meanwhile, I could if this was a paper or lecture have said "... are examples asymptotically approaching a CRDT; So don't really worry just because something is not strictly a CRDT doesn't mean you're gonna have bugs in your concurrency implementation". But "in passing" on a comment thread that should have been 30 seconds of all our time, who wants to specifically write all of that nuance every comment? In fairness I even tried to differentiate academic vs non-academic.
If you're still reading this thread (especially anyone that has been downvoting me) I hope you can appreciate this context.
Yeah the question asking about real world use cases is a non-academic question. But that still doesn't mean git is a crdt.
Maybe its like someone asking "What are some other uses for javascript?" and someone says "Unity games!". There might be a way to use javascript when making unity games, but most unity games aren't made with any javascript. "Oh well C# is sort of close to javascript though?" - Yes, but its still not javascript. I don't think calling C# "mostly javascript" is a useful or helpful idea. To the expert its wrong, and for the novice its confusing.
We don't draw a line from "actually really javascript" to haskell or something and say C# is 80% javascript. "Javascript" as a term just means javascript. Not the space of things similar to javascript.
Its the same with CRDTs. "CRDT" doesn't mean "eventually consistent distributed data store". It just refers to the 'pure' thing. You know what else meets the definition of a CRDT, but is very dissimilar from a database? An integer, sent between peers, and merged using the MAX() function. You could also argue that the unit type is a (degenerate) CRDT. Just like how a straight line is sort of a triangle with a side length of 0.
Thats how I see the claim that "Git is a CRDT - well sort of". To experts, its wrong. And to everyone else its kinda misleading. Git isn't a CRDT in the same way C# isn't javascript. (And a web browser, V8, a javascript program, nodejs - all of these things are also not javascript.)
Now, Git's content-addressable blob store is a CRDT, so it might be fair to say git is "using a CRDT internally". But I hope you can see how that claim is kinda confused too. Git's branches and commit merging don't obey the CRDT merge rules. Because git has that stuff added, it makes it no longer a CRDT. Its the same as if you add a lump to the side of a triangle, its not a triangle any more.
I like the max integer example because it’s clearly a CRDT and also not particularly useful on its own. I get the impression that people think CRDTs are some sort of magical thing that gives you bug-free sync, but it’s pretty easy to explain why naively using a single max integer to implement a distributed counter would be buggy. CRDTs are just tools, and like any tool they can be misused!
If git only had commit and fetch, it would be a crdt (though not a very useful one). But git also needs to merge commits / branches together. And it does that via diff-match-patch, which doesn’t pass the crdt rules. (Since it’s not idempotent, associative, automatic and in many cases deterministic).