More

ot · 2026-01-14T00:53:05 1768351985

You can do even faster, about 8ns (almost an additional 10x improvement) by using software perf events: PERF_COUNT_SW_TASK_CLOCK is thread CPU time, it can be read through a shared page (so no syscall, see perf_event_mmap_page), and then you add the delta since the last context switch with a single rdtsc call within a seqlock.

This is not well documented unfortunately, and I'm not aware of open-source implementations of this.

EDIT: Or maybe not, I'm not sure if PERF_COUNT_SW_TASK_CLOCK allows to select only user time. The kernel can definitely do it, but I don't know if the wiring is there. However this definitely works for overall thread CPU time.

jerrinot · 2026-01-14T01:02:35 1768352555

That's a brilliant trick. The setup overhead and permission requirements for perf_event might be heavy for arbitrary threads, but for long-lived threads it looks pretty awesome! Thanks for sharing!

ot · 2026-01-14T01:05:05 1768352705

Yes you need some lazy setup in thread-local state to use this. And short-lived threads should be avoided anyway :)

catlifeonmars · 2026-01-14T07:39:51 1768376391

I guess if you need the concurrency/throughput you should use a userspace green thread implementation. I’m guessing most implementations of green threads multiplex onto long running os threads anyway

jerrinot · 2026-01-14T09:01:50 1768381310

In a system with green threads, you typically want the CPU time of the fiber or tasklet rather than the carrier thread. In that case, you have to ask the scheduler, not the kernel.

nly · 2026-01-14T09:04:54 1768381494

Why do you need a seqlock? To make sure you're not context switched out between the read of the page value and the rdtsc?

Presumably you mean you just double check the page value after the rdtsc to make sure it hasn't changed and retry if it has?

Tbh I thought clock_gettime was a vdso based virtual syscall anyway

ot · 2026-01-15T17:08:03 1768496883

> Presumably you mean you just double check the page value after the rdtsc to make sure it hasn't changed and retry if it has?

Yes, that's exactly what a seqlock (reader) is.

mgaunard · 2026-01-14T14:40:42 1768401642

clock_gettime is not doing a syscall, it's using vdso.

jerrinot · 2026-01-14T15:07:50 1768403270

clock_gettime() goes through the vDSO shim, but whether it avoids a syscall depends on the clock ID and (in some cases) the clock source. For thread-specific CPU user time, the vDSO shim cannot resolve the request in user space and must transit into the kernel. In this specific case, there is absolutely a syscall.

ot · 2026-01-14T00:48:18 1768351698

If you look below the vDSO frame, there is still a syscall. I think that the vDSO implementation is missing a fast path for this particular clock id (it could be implemented though).

jerrinot · 2026-01-14T00:49:37 1768351777

Exactly this.

ot · 2026-01-06T17:01:00 1767718860

That's probably true for small primitive types, but if your objects are expensive to move (like a large struct) it might be beneficial to minimize swaps.

praptak · 2026-01-06T19:20:20 1767727220

Yeah, it might be interesting to run some profiling of both algorithms and see how they perform dependent on the size of the blocks being swapped (which doesn't even have to be equal to the size of the object in the array).

ot · 2026-01-05T16:38:45 1767631125

The query is incorrect, it will return any posting that contains the words "vision" and "pro", not necessarily consecutive.

It looks like phrasal search is supported, searching "vision pro" in quotes only returns 212 results worldwide

https://jobs.apple.com/en-us/search?search=%22vision+pro%22&...

Spot-checked a few and they all seem to be Vision Pro related.

--

EDIT: Actually even this is not accurate, as it matches postings with sentences like

> Fundamental to the success of iPhone, iPad, Apple Watch, Apple TV, Vision Pro, and Mac ...

but not specific to Vision Pro.

However we can filter on products and services for Vision Pro and visionOS, and it gives 106 results:

https://jobs.apple.com/en-us/search?search=%22vision+pro%22&...

Croftengea · 2026-01-05T16:41:40 1767631300

Even quoted search is not representative because some ads just mention Vision Pro as one of the many products Apple is known for:

> "The Manufacturing Design team enables the mass production of Apple's entire product line from iPhones, iPads and MacBooks to the Mac Pro, AppleTV, Apple Watch and Vision Pro."

ot · 2026-01-05T16:42:48 1767631368

Yeah, was just about to edit the comment :)

coreyh14444 · 2026-01-05T16:40:09 1767631209

212 is still a pretty high number.

g947o · 2026-01-05T16:43:45 1767631425

212 is still a large number compared to Meta's Reality Labs which is downsizing.

ot · 2025-12-03T11:16:33 1764760593

> can avoid or defer a lot of the expected memory allocations of async operations

Is this true in realistic use cases or only in minimal demos? From what I've seen, as soon as your code is complex enough that you need two compilation units, you need some higher level async abstraction, like coroutines.

And as soon as you have coroutines, you need to type-erase both the senders and the scheduler, so you have at least couple of allocations per continuation.

menaerus · 2025-12-03T12:16:51 1764764211

I can't comment on this particular implementation but few years back I played around with a similar idea, so not quite 1-to-1 mapping, but my conclusion derived from the experiments was the same - it is allocation-heavy. Code was built around similar principles, with type-erasure on top of future-promises (no coroutines back then in C++), and work-stealing queues. Code was quite lean, although not as feature-packed as folly, and there was nothing obvious to optimize apart from lock-contention, and dynamic allocations. I did couple of optimizations on both of those ends back then and it seemed to confirm the hypothesis of heavy allocations.

gpderetta · 2025-12-03T11:31:59 1764761519

I have only played very little with it and I don't have anything in production yet, so I can't say.

I did manage to co_await senders without additional allocations though (but the code I wrote is write-only so I need to re-understand what I did 6 months ago again).

I recall that I was able to allocate the operation_state as a coroutine-local, and the scheduler is node based, so the operation_state can just be appended to the scheduler queue.

You still do need to allocate the coroutine itself, and I haven't played with coroutine allocators yet.

ot · 2025-12-02T15:49:34 1764690574

Is it so hard for people to understand that Europe is a continent, EU is a federation of European countries, and the two are not the same?

usrnm · 2025-12-02T15:54:41 1764690881

Europe isn't even a continent and has no real definition (none that would make any sense, anyway), so the whole thing is confusing by design

rc1 · 2025-12-03T19:50:36 1764791436

If Europe isn’t a continent, on what continent are the EU member states sitting on?

rkomorn · 2025-12-03T19:52:57 1764791577

Eurasia is the widely accepted answer.

denysvitali · 2025-12-02T17:17:38 1764695858

I honestly think it is. The amount of people who thinks Europe and EU are the same thing is really concerning.

And no, it's not only americans. I keep hearing this thing from people living in Europe as well (or better, in the EU). I also very often hear phrases like "Switzerland is not in Europe" to indicate that the country is not part of the European Union.

MadDemon · 2025-12-02T18:23:50 1764699830

Switzerland has such close ties to the EU that I would consider them half in.

lostmsu · 2025-12-02T17:08:33 1764695313

Isn't London on an island, mr. Pedantic?

TulliusCicero · 2025-12-02T17:39:04 1764697144

So I guess Japan isn't Asian then?

layer8 · 2025-12-03T01:42:16 1764726136

While Japan is part of Asia, and Asia is a continent, Japan is also separated from the Asian continent: https://en.wikipedia.org/wiki/Geography_of_Japan#Location

lostmsu · 2025-12-03T02:56:05 1764730565

What's more interesting is that the comment you are replying to mistakenly asked me instead of asking the parent.

GaggiX · 2025-12-02T15:51:24 1764690684

I think you missed the joke

ot · 2025-10-28T15:34:25 1761665665

> Since nobody had figured out any downsides to PCG's yet, everyone shrugged and said "might as well just go with that then", and that is where, as of 2019, the art currently stands. The problem is solved, and life is good.

I wonder who "everyone" was, I'm not aware of many high-profile projects adopting PCG as a default. As of 2025, several high-profile runtimes (including all the major browsers) use xorshift variants [1]

Is there a list of users of PCG?

[1] See Adoption section in https://prng.di.unimi.it/

vlovich123 · 2025-10-28T15:37:54 1761665874

It kind of doesn’t matter if there are users - there are people still stupidly using Mersenne Twister. The point is that PCG is better than xorshift and related in that family. That other high profile applications haven’t switched is besides the point that PCG is objectively better:

> O'Neill proposes testing PRNGs by applying statistical tests to their reduced-size variants and determining the minimum number of internal state bits required to pass.[7] TestU01's BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications. PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19 and Mersenne twister fails despite 19937 bits of state.[9]

ot · 2025-11-01T13:37:59 1762004279

> It kind of doesn’t matter if there are users [...] The point is that PCG is better

No that's not the point that the article makes and that I'm questioning, it says "everyone shrugged" which implies consensus, and I'm asking for evidence of that consensus, not of the objective quality of the two generators.

Also I don't think that that paragraph is even close to demonstrating "objectively better": the author of PCG pointed out one arbitrary metric, minimum state size, where PCG beats old variants of xorshift* on a statistical test suite, and in the meantime much better variants have come out. That metric is meaningless since everyone uses much bigger state anyway.

RNGs are a tricky subject, there isn't a singular measure of quality, statistical tests are necessary but not sufficient. The best testament to RNG quality is wide adoption, which builds confidence that there aren't undiscovered failure modes.

camel-cdr · 2025-11-02T16:24:05 1762100645

Agreed, you can get minimum state size by just using a linear counter and a sufficiently good hash function.

The metric is quite useless, what matters way more IMO is the quality of scaled down variants.

adgjlsfhk1 · 2025-10-28T18:26:52 1761676012

IMO there's plenty of reason to use Xoshiro over PCG. the quality differences between the best xoshiro and pcg differences are minimal (especially because most languages use a 256 bit state since it makes it easier to split/jump without worrying about duplicate streams), and Xoshiro generators tend to be easier to SIMD for when you need lots of random numbers.

vlovich123 · 2025-10-28T21:52:29 1761688349

There are SIMD versions of PCG and most variants you find online aren’t SIMD.

WarrenWeckesser · 2025-10-28T16:14:29 1761668069

The default bit generator in NumPy is PCG-64 [1].

[1] https://numpy.org/doc/stable/reference/random/bit_generators...

aj_hackman · 2025-10-28T17:54:29 1761674069

Much like my beloved comb sort, I use xorshift because the implementation is small and it's Good Enough. God's Own 100 SLOC PRNG would have to be near-perfect and take three clock cycles to contemplate switching.

camel-cdr · 2025-10-28T16:42:37 1761669757

> nobody had figured out any downsides to PCG's yet

BTW, people have broken PCG already: https://hal.science/hal-02700791/file/main.pdf

It takes up to 20000 CPU hours to break the seed from 512 output bits with an unknown state, increment and multiplier. (the multiplier is usually fixed constant)

tptacek · 2025-10-28T17:10:00 1761671400

What does it mean to "break" PCG? It's not a secure random number generator.

camel-cdr · 2025-10-28T17:15:47 1761671747

Seed recovery. It's not meant to be cryptographically secure, but previously nobody had reversed it.

Showing that reversal takes that many CPU hours shows how good the PRNG quality is.

mahemm · 2025-10-28T18:23:09 1761675789

To me this is completely unrelated to the quality of the PRNG, because security is explicitly a non-goal of the design. A general-purpose non-cryptographically secure PRNG is evaluated primarily on speed and uniformity of output. Any other qualities can certainly be interesting, but they're orthogonal to (how I would evaluate) quality.

tptacek · 2025-10-28T18:27:46 1761676066

Right: put differently, why would you bother to select among the insecure RNGs an RNG whose "seed" was "harder" to recover? What beneficial property would that provide your system?

avadodin · 2025-10-28T19:12:00 1761678720

CSPRNGs have all of the desirable properties for the output.

All else being equal, I don't think it is possible for a trivially reversible generator to have better statistical properties than a generator whose output behaves more like a CSPRNG.

It can definitely be good enough and or faster, though.

tptacek · 2025-10-28T20:02:50 1761681770

Right, I think defaulting to a CSPRNG is a pretty sane decision, and you'd know if you had need of a non-CSPRNG RNG. But what does that say about the choice between PCG and xorshiro?

avadodin · 2025-10-29T00:25:10 1761697510

Defaulting to a CSPRNG pre-seeded with system randomness is not a bad choice per se(especially given many users don't know they need one) but current ones are much slower than the RNGs we are discussing.

If you're going to provide a non-CS one for general simulation purposes, you probably want the one that is the closest to indistinguishable from random data as you can without compromising performance, though.

Some people will have more than enough with a traditional LCG(MC isn't even using RNGs anymore) but others may be using more of the output in semantically relevant ways where it won't work.

If Xoshiro's state can be trivially recovered from a short span of the output, there is a local bias right there that PractRand lets through but that your application could accidentally uncover.

The choice is: Are the performance gains enough to justify that risk?

tptacek · 2025-10-29T00:39:45 1761698385

Why does it matter if the state can be trivially recovered? What does that have to do with the applications in which these generators are actually used? If the word "risk" applies to your situation, you can't use either xorshiro or PCG.

avadodin · 2025-10-29T01:07:46 1761700066

This is too deep to reply but if a bit is dependent on the value of a bit a couple bytes back then it is not acting randomly.

It's not about security.

I hope you can agree that if every time there is a treasure chest to the left of a door, a pink rabbit spawns on the top left of the room, that's not acting very random-like.

I'm not taking a position on the perceived added value of PCG over Xoshiro.

mahemm · 2025-10-29T15:41:09 1761752469

The property you're talking about (next bit unpredictability) is important for a CSPRNG, but it doesn't matter at all for a PRNG. A PRNG just needs to be fast and have a uniform output. LCGs, for instance, do not have next bit unpredictability and are a perfectly fine class of PRNG.

avadodin · 2025-10-29T21:41:53 1761774113

The paper that triggered this thread "breaking" PCG sees it as potentially in the same class of issues as using RANDU.

> our results […] do mean that [PCG']s output has detectable properties. Whether these properties may affect the result of Monte-Carlo numerical simulations is another matter entirely.

Again this is on PCG which required a breaking effort.

The short version of Xorshift as originally presented by Marsaglia outputting its whole state for example is bound to have behaviors like my room-generation example emerging fairly easily. Particularly, with low hamming-weight states.

I doubt Xoshiro's output is that bad but if presented as trivial to recover vs PCG, that to me indicates potential issues when using the output for simulation.

charlieyu1 · 2025-10-28T21:54:18 1761688458

Any recoverability sounds very bad.

Why shouldn’t I just use eg sha512 on the previous hash and drop half the bits?

throwaway150 · 2025-10-28T22:06:22 1761689182

> Any recoverability sounds very bad.

PRNGs are not meant to be cryptographically secure. If you don't want recoverability by all means use SHA512 or a proper CSPRNG.

But saying PRNGs are bad because there is recoverability is like saying salt is bad because it isn't sweet. PRNGs are not meant for non-recoverability and salt isn't meant to be sweet.

tptacek · 2025-10-28T22:01:42 1761688902

It's not bad because "preventing seed recovery" isn't the job of an insecure RNG. If you care about seed recovery, you must use a secure generator. There aren't degrees of security here; PCG is insecure, and (say) the LRNG or CTR-DRBG are not.

whyever · 2025-10-29T19:55:07 1761767707

I agree, but https://www.pcg-random.org/ still advertizes PCG as "challenging" to predict, and critizises other RNGs as predictable and insecure.

tptacek · 2025-10-29T19:57:15 1761767835

Right, that's a problem, because nobody that cares about this should be using PCG.

avadodin · 2025-10-30T08:39:22 1761813562

> Predictable — after 624 outputs, we can completely predict its output.

> we recovers[sic] all the secret information using 512 consecutive output bytes

oof

WalterGillman · 2025-10-29T13:24:39 1761744279

I wonder how much it would take to break mine.

https://github.com/waltergillman/xorshift_sbox

I have not been blessed by an education so I can't be eloquent and write proofs and papers and stuff but it passes PractRand for 4GB with only 32 bits of state.

Not very fast on modern computers, I will concede.

ot · 2025-10-22T13:18:27 1761139107

But then you could add another level of slower (but still faster than RAM) and larger cache. So it is after all the CPU caches, but the first of all the memory caches. A more mathematically correct name would be L_omega.

ot · 2025-09-16T22:02:38 1758060158

Reminds me of the old quote

> everyone only uses 20% of C++, the problem is that everyone uses a different 20%

codr7 · 2025-09-17T06:53:49 1758092029

I suspect that percentage has shrunk quite a bit lately.

ot · 2025-09-12T21:23:55 1757712235

> It was so easy once we saw it that there was no reason to keep the placemat for notes, and we left it behind. Or maybe we did bring it back to the lab; I'm not sure. But it's gone now.

https://commandcenter.blogspot.com/2020/01/utf-8-turned-20-y...