C++20 has apparently a fix for it with std::jthread, though.
With all possible the learnings from Java, .NET, Erlang, TBB, Concurrency Runtime, and yet ISO C++ did not manage to get a proper concurrency story, and it full of traps like the one you mention.
Another one is std::async, which might actually be synchronous, depending on a set of factors.
A related question if anyone knows good answers here.
What programming languages' de-facto thread implementations are not wrappers around pthreads? I think Go has its own thread implementation? Or am I mistaken?
Java does not specify the actual threading model, so you can get green threads (user space) or red threads (kernel threads).
The upcoming Project Loom, intends to make it so that green threads become the default (aka virtual threads on Loom), but you can still ask for kernel threads, given that is what most JVM implementations have converged into.
GHC Haskell's runtime has a "default" light-weight thread system (forkIO) that schedules logical threads on the available operating system threads and parallelises them across available CPUs:
> A newly spawned Erlang process uses 309 words of memory in the non-SMP emulator without HiPE support. (SMP support and HiPE support both add to this size.)
And a word is the native register size, so 4 or 8 bytes these days, so fairly small, but not 64 bytes small.