Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's the creation date of that guid though. It doesn't say anything about the entity in question. For example, you might be born in 1987 and yet only get a social security number in 2007 for whatever reason.

So, the fact that there is a date in the uuidv7 does not extend any meaning or significance to the record outside of the database. To infer such a relationship where none exists is the error.





You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

I bet people will extract that date and use it, and it's hard to imagine use which wouldn't be abuse. To take the example of a PN/SSN and the usual gender bit: do you really want anyone to be able to tell that you got a new ID at that time? What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Leaks like that, bypassing whatever access control you have in your database, is just one reason to use real random IDs. But it's even a pretty good one in itself.


> What could you suspect if a person born in 1987 got a new PN/SSN around 2022?

Thank you for spelling it for me. For the readers, It leaks information that the person is likely not a natural born citizen. The assumption doesn't have to be a hundred percent accurate, There is a way to make that assumption And possibly hold it against you.

And there are probably a million ways that a record created date could be held against you If they don't put it in writing, how will you prove They discriminated against you.

Thinking... I don't have a good answer to this. If data exists, people will extract meaning from it whether rightly or not.


To quote the great Mr Sparrow:

> The only rules that really matter are these: what a man can do and what a man can't do.

When evaluating security matters, it's better to strip off the moral valence entirely ("rightly") and only consider what is possible given the data available.

Another potential concerning implication besides citizenship status: a person changed their id when put in a witness protection program.


> You can argue that, but then what is its purpose? Why should anyone care about the creation date of a by-design completely arbitrary thing?

Pretty sure sorting and filtering them by date/time range in a database is the purpose.


If you need sorting and filtering by date, just add a timestamp to your table instead of misusing an Id column for that.

That happens, in general. The benefit comes when it’s time to look up by uuid only; the prefix is an index to its disk block location.

> the prefix is an index to its disk block location

What? This is definitely not the case and can’t be because B-tree nodes change while UUIDs do not.


I didn’t mean that literally, but no longer editable. Was supposed to have “like” etc in there.

But UUIDv7 doesn’t change that at all. It doesn’t matter what flavor of UUID you choose. The ID is always “like” an index to a block in that you traverse the tree to find the node. What UUIDv7 does is improve some performance characteristics when creating new entries and potentially for caching.

> just

It is easy to have strong opinions about things you are sheltered from the consequences of.


Exactly, be explicit, don't shoehorn multiple purposes into a single column that's supposed to be a largely meaningless unique identifier.

That is absolutely not the purpose. The specific purpose of uuidv7 is to optimize for B-Tree characteristics, not so you can craft queries based on the IDs being sequential.

This assumption that you can query across IDs is exactly what is being cautioned against. As soon as you do that, you are talking a dependency on an implementation detail. The contract is that you get a UUID, not that you get 48 bits of timestamp. There are 8 different UUID types and even v7 has more than one variant.


B-trees too but also bucketing for formats like delta lake or iceberg, where having ids that cluster will reduce the number of files you need to update.

> You can argue that, but then what is its purpose?

The purpose is to reduce randomness while still preserving probability of uniqueness. UUIDv4 come with performance issues when used to bucket data for updates, such as when there used as primary keys in a database.

A database like MySQL or PostgreSQL has sequential ids and you’d use those instead, but if you’re writing something like iceberg tables using Trino/Spark/etc then being able to generate unique ids (without using a data store) that tend to be clustered together is useful.


I would argue that is one of very few situations where leaking the timestamp that the ID was created when you already have the ID is a possible concern at all.

And when working with very large datasets, there are very significant downsides to large, completely random IDs (which is of course what the OP is about).


The time component either has meaning and it should be in its own column, or it doesn't have meaning and it is unnecessary and shouldn't be there at all.

I'm not a normalization fanatic, but we're only talking about 1NF here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: