For the concurrent queue at the bottom, the issue is with the ABA problem where you're guaranteed to execute the atomic compare-and-swap correctly, but you aren't guaranteed that the same pointer value means the queue is in the same state: if you have a queue `123` and start a dequeue, you load current head = 1, next = 2, and then can be scheduled out for another thread to execute a series of operations resulting in a queue of `145`. When your thread resumes, the compare-and-swap of 1->2 works fine...but now the head of your queue is the freed 2 node. Woops! Concurrent primitives like RCUs or hazard pointers (or garbage collection!) are needed in order to safely free data under most lockfree data structures.
Yeah, this stuff is hard even before you bring memory ordering into the picture. What's worse, checks such as thread sanitizer will not catch it, since it's not a simple data race. Proving functional invariants about multithreaded code requires a different class of tools.
Funnily enough, ARM has another difference here on top of just having a non-TSO memory model: LL/SC atomics solve the ABA problem, because the word holding the queue head has been written to and the store-conditional fails, even though the contents of the memory will be the same at the end. Which makes sense once you say it and some docs about LL/SC will mention that, but also reading various lockfree data structure papers I've basically never seen talked about (probably because LL/SC progress guarantees are kinda scuffed)
the issue with LL/SC is that it is hard to expose to higher level languages than assembler. What you can do within an LL/SC section without causing it to spuriously fail is very much architecture dependent and you need full control of the load and stores within it. Exposing it to compiler optimizations won't work reliably.
So in practice LL/SC, in higher level languages, is used to implement CAS, XCHG and other atomic primitives which don't allow taking advantage of the ABA resistance. As an additional downside, you get a weak CAS that you always need to call in an loop.
Are there tools out there that can prove semantic invariants in multi-threaded code? I don't understand how there can be automated tools around it at all because how would that even be possible?
There are model checkers such as nidhugg (C++), dscheck (ocaml). They take a test case and reach all possible terminal states by trying different interleavings.
Crucially, they don’t have to try all interleavings to reach all terminal states, making the enumeration quite fast.
Rust won't solve the ABA problem, no. You'd be in unsafe Rust if you were writing something that could encounter the ABA problem.
You wondered out loud how it was even possible to do that kind of analysis, and that's where my mind went. Evidently people think it's a bad take. That's as deep as it goes.
The ABA problem is a false-positive execution of a CAS speculation on a shared memory location.
It is very easy to create an ABA problem in safe Rust. Data race free sequential consistency, which Rust has, is almost completely orthogonal to the ABA problem.
This is an area of active PLT research, we haven't come anywhere close to addressing the problem in the general case.
I suspect we'll be seeing all kinds of bugs caused by a generation of programmers thinking everything has guard rails in Rust because "safety", so they can turn their brain off and not think. In reality, those promises of safety largely disappear when threads, files, signals, and networks are involved.
At the end of the day, your programs run on computers which exist in the physical world. The abstractions are mostly isomorphic, but it's at the margins where the abstractions aren't isomorphic that all the interesting things happen.
> The ABA problem is a false-positive execution of a CAS speculation on a shared memory location.
In safe Rust, if I have a mutable reference to Foo, and Foo contains a shared reference to Bar, then no other thread has a mutable reference to Foo or Bar. So no other thread will make a CAS on my reference to Bar, or drop Bar and then allocate something at the same memory address, etc.
You could have some higher level ABA problem I suppose, where you acquire a lock, read a value, give up the lock, and then make spurious assumptions about what happens while you've let the lock go. But that's obviously not what we're talking about if we're talking about CAS. (ETA: or if these were application level references, eg indices into a list.)
If we're going to implement a lockfree data structure, we're going to need unsafe Rust to hand-roll interior mutability. Because we're going to be sharing mutable state. Which isn't allowed in safe Rust.
I guess? I've only ever heard about the ABA problem in reference to pointers, eg in the context of lockfree queues. Maybe that's my ignorance. (Which is why I addressed shared references in my comment.)
Yes, if you don't hold a lock on a value, or exert some kind of control at the API level (eg making it monotonic so your CAS will work), you can't make assumptions about it. I think you'll find that Rust developers understand that concept about as well as any other community of concurrent developers.
But yes, granted, the semantic information about these integers isn't represented in Rust's type system, and won't be caught by it's static analysis.