Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).





I'm also not a huge fan of leaking server-side information; I suspect UUIDv7 could still be used in statistical analysis of the keyspace (in a similar fashion to the german tank problem for integer IDs). Also, leaking data about user activity times (from your other comment) is a *really* good point that I hadn't considered.

I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.

My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.

To be clear, in a practical sense, this is a bit of a micro-optimization; as far as I understand it, this really only helps you by improving the data locality of temporally-related items. So, for example, if you had an "order items" table, containing rows of a bunch of items in an order, it would speed up retrieval times because you wouldn't need to do as many index traversals to access all of the items in a particular order. But on, say, a users table (where you're unlikely to be querying for two different users who happen to have been created at approximately the same time), it's not going to help you much. Of course the exact same critique is applicable to integer IDs in those situations.

Although, come to think of it, another advantage of a user-visible v4 with v7 Pk is that you could use a different index type on the v4 ID. Specifically, I would think that a hash index for the user-visible v4 might be a halfway-decent way to go.

I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.


I think a bigger benefit from doing that would be that inserts would be cheaper. Instead of an expensive insert into the middle of an index for every table that needs an index on that key, you can do a cheaper insert at the end of the index for all of them except for the one that uses uuid4.

But if you are doing that, why not just use an incrementing integer instead of a uuidv7?


Certainly for many applications, the autoint approach would be fine.

The benefit of uuid in this case is that it allows horizontally scalable app servers to construct PKs on their own without risk of collisions. In addition to just reducing database load by doing the ID generation on the app server (admittedly usually a minor benefit), this can be useful either to simplify insert queries that span multiple tables with FK relationships (potentially saving some round trips in the process) or in very niche situations where you have circular dependencies in non-nullable FKs (with the constraint deferred until the end of the transaction).


> I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":

* https://github.com/stateless-me/uuidv47

* Sept 2025: https://news.ycombinator.com/item?id=45275973


If that kind of stuff is on the able you can also use boring 64bit integer keys and encrypt those (e.g. [1]). Which in the end is just a better thought out version of what the article author did.

UUIDv47 might have a space if you need keys generated on multiple backend servers without synchronization. But it feels very niche to me.

1: https://wiki.postgresql.org/wiki/XTEA_(crypt_64_bits)


Out of curiosity, why is it an issue if you leak creation time?

The issue will be very context specific. In other words to (reasonably) answer the question, we'd have to judge each application individually.

For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."

Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.


Well you're leaking user data. I'm sure you can imagine situations where "the defendant created an account on this site on this date" could come up. And the user could have created that account not knowing that the creation date is public, because it's not listed anywhere in the publicly viewable part of the profile other than the UUID in the URL.

Discord is doing fine.

Hacker news is also doing fine, even though I can just click your profile and see you joined in october 2024. It doesn't matter for every use case.

But there are cases where it matters. Using UUIDv7 for identifiers means you need to carefully consider the security and privacy implications every time you create a new table identified by a UUID, and you'll possibly end up with some tables where you use v4 and some where you use v7. Worst case, you'll end up with painful migrations from v7 to v4 as security review identifies timestamped identifiers as a security concern.


Pretty much every social media app has a "Member since X" visible on public profiles. I don't think it's an issue.

Who said I was talking about social media?

Well where else do users have public profiles?

The whole point though is that the ID itself leaks info, even if the profile is not public. There are many cases where you reference an object as a foreign key, even if you can't see the entire record of that foreign key.

I can't think of any.

Sending a friend request is an obvious example.

Ok but in that case, why not use the user ID as the primary key?

If your system (pseudo-) random number generator (RNG) is compromised to derive a portion of its entropy from things that are knowable by knowing the time when the function ran, then the search space for cracking keys created around the same time can be shrunken considerably.

This doesn’t even rely on your system’s built-in RNG being low quality. It could be audited and known to avoid such issues but you could have a compromised compiler or OS that injects a doctored RNG.


E.g, if your service users have timestamp as part of the key and this data is visible to other users, you would know when that account was created. This could be an issue.

There was a HN comment about competitors tracking how many new signups are happening and increasing the discounts/sales push based on that. Something like this.

In a business I once worked for, one of the users of the online ordering system represented over 50% of the business' income, something you wouldn't necessarily want them to know.

However, because the online ordering system assigned order numbers sequentially, it would have been trivial for that company to determine how important their business was.

For example, over the course of a month, they could order something at the start of the month and something at the end of the month. That would give them the total number of orders in that period. They already know how many orders they have placed during the month, so company_orders / total_orders = percentage_of_business

It doesn't even have to be accurate, just an approximation. I don't know if they figured out that they could do that but it wouldn't surprise me if they had.


This is also something that depends heavily on regulations. In my home country, invoice numbers have to be sequential by law, although you can restart the numbering every year.

Yes, even if it's not a legal requirement it's definitely best practice to have sequential invoice numbers. I thought about this at the time but these numbers aren't invoice numbers, only order numbers.

A global sequence, or sequence-per-account? I suspect the latter?

A sequence per "series", where a series can be a fiscal year, a department or category, etc. But I am not sure if you can have one series per customer, I only find conflicting information.

You can have more details here, in the section "Complete invoice":

https://sede.agenciatributaria.gob.es/Sede/en_gb/iva/factura...

https://www.boe.es/buscar/act.php?id=BOE-A-2012-14696#a6 (Spanish only)


That's happening everywhere. You can order industrial parts from a Fortune 500 and check some of the numbers on it too, if they're not careful about it.

Apart from all the other answers here: an external entity knowing the relative creation time for two different accounts, or just that the two accounts were created close in time to each other can represent a meaningful information leak.

Depends on the data. If you use a primary key in data about a person that shouldn't include their age (e.g. to remove age-based discrimination) then you are leaking an imperfect proxy to their age.

So the UUID could be used as an imperfect indicator of a records created time?

UUIDv7 but not UUIDv4.

I suppose timing attacks become an issue too.

UUIDv7 still have a lot of random bits. Most attacks around creating lots of ids are foiled by that

Admins, early users, founders, CEOs etc etc would have althe lowest creation time...

You shouldn't generally use PKs as public identifiers, least of all UUIDs, which are pretty user hostile.

I really don't see the issue with having a UUID in a URL.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: