Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Prefixed, dual-token, base58 encoded API Keys (github.com/seamapi)
100 points by seveibar on May 10, 2022 | hide | past | favorite | 54 comments


I like it. I'm not sure that "short token" is the right term for that - perhaps "token id". Also the only occurrence of "it's" in the README should be "its": https://brians.wsu.edu/2016/05/19/its-its/

Edit: Actually, the short token is more than just a token ID - it's also required, right? Perhaps replace:

   When we receive an incoming request, we search our database for hash(long_token)
with:

   When we receive an incoming request, we search our database for hash(long_token) and short_token. A token can be blocklisted by its short_token.
So I think I prefer "short token" to "token id" but perhaps there is a better name for it. If it did get renamed it would probably make sense to rename "long token" as well. I'll defer to experts on this.


fixed thank you :) - and great suggestion!

edit based on your edit: Added your recommendation to the README, thank you!


My suggestion was blocklist which looks a lot like blacklist because only one letter is different. Perhaps "deny list" would be better. It's what Apple is using. https://developer.apple.com/news/?id=1o9zxsxl


my mistake! fixed :)


One more suggestion - replace

> A token can be blocklisted by its short token.

With:

> An API Key can be blocklisted by its short token.


Author here: I was curious what HN would think of this token style, we're currently using it for getseam.com . I think it has a lot of advantages over other token styles (e.g. double-click to select, standard alphabet, compatible with secret scanning). Although this is a typescript library, we originally designed this API key for use in Ruby on Rails and the pattern should be fairly portable.


Thanks for sharing! As somebody who has been bitten before by copy-pasting a key I appreciate that as the headlining benefit :)

Thoughts from an ergonomic perspective:

- Base58 is nice and makes easy-to-handle values which is nice

- Naming of shortToken, longToken, and token are confusing to me, because it makes it sound like all of these values are tokens/secrets of some sort, rather than components of the token. Not to bikeshed, but what jumps to my mind is more like "prefix_id_secret", which helps make it clear what role each one plays.

- I don't think keyPrefix should have a default, I think your function should raise an error if it's not provided. Otherwise more than one user of your library is going to end up with "mycompany" keys in the wild.

- You probably ought to validate that keyPrefix does not contain an underscore and raise an exception if it does, otherwise that way lies pain for people trying to handle these keys.

Thoughts from a security perspective:

- You should probably use crypto.timingSafeEquals(buffer, buffer) [1] instead of comparing strings

- If you're trying to store the secret's hash, isn't something like bcrypt/scrypt preferred over raw sha256 these days?

[1] https://nodejs.org/api/crypto.html#cryptotimingsafeequala-b


> If you're trying to store the secret's hash, isn't something like bcrypt/scrypt preferred over raw sha256 these days?

Not for this type of use case. If a secret is long and generated from a cryptographically secure source (eg /dev/urandom), then any cryptographic hash function is fine as brute forcing the secret itself is not feasible. A single pass of sha256 is fine and presumably you’d be doing many of these operations each second in a high throughput application.

“Slow” hash functions like bcrypt or scrypt are for user provided secrets that might not have a large amount of entropy. It’s fine for something like user authentication but would be way too slow and pointless for API keys.


> If a secret is long

It seems this one, at 24 base-58 characters, is only roughly twice as long as it needs to be, right? How short is too short?


Appreciate the thoughts on the naming/exceptions and thanks for taking a look at the implementation! Definitely making some adjustments tonight.

I think your thought on bcrypt/scrypt vs SHA256 is super interesting here. The long token is treated a lot like a password, so we should treat it similarly and use slow hashing. However, unlike a password an API key is repeatedly used for authentication instead of being exchanged for a session token. I don't think this meaningfully changes anything- so I think you're right that bcrypt/scrypt would be a better choice!

Edit: Also see koomla's answer!


What meaningfully changes the requirements is that takes many more attempts to brute force an API key because they're longer.

I'd consider using a weaker bcrypt/scrypt/argon2 for API keys than would be used for login. Perhaps one that takes a hundredth or a thousandth of the time.

It could be unnecessary though.

Here's the main scenario: someone snagged a hashed long token from a database backup and wants to get the unhashed long token so they can use it to access something behind the API. They can do all the brute forcing they want and the server owner will never know about it. There are 1 with 42 zeroes worth of potential long tokens to try (58 * '51FwqftsmMDHHbJAMEXXHCgG'.length). Seems unlikely even though it's very very cheap to hash a potential long token. The tokens this is using are pretty short, but still not short enough to make cracking it feasible.

If you used argon2 maybe you could cut mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG down to mycompany_BRTRKFsL_51FwqftsmMDH. Shorter token!

I would perhaps make the long token 58 digits long just because it would have more than a googol (10^100) possible values but still be shorter than 80 characters with the prefix and short token, and maybe swap the SHA hashing for a low cost (memory and CPU) argon2.

If you wanted to have really short API keys you could get creative with argon2. This is relevant for magic links.

https://startdebugging.net/2013/10/counting-up-to-one-trilli...


Couple comments upon looking at the actual code:

- You can promisify randomBytes once and reuse it rather than twice for every invocation

- There shouldn’t be a default value for the company name or people will end up using it.

- The company name isn’t validated so it could contain underscores which would cause issues with the short token parsing as it assumes it’s the second “chunk”

- The equals comparison of the hashes for the secrets is not timing safe. It’s not as bad as if they were plain text but it does short circuit due to how string equals works. Use the actual built in timing safe equals on the Buffer hash (not the stringified hex).


thanks for taking a look! I've created an issue on the repo and should be able to address these tonight :)


On iPhone the long-press-to-select doesn’t cross the underscore boundary, so you end up just selecting part of the token. Maybe you can do without those underscores?


**shakes fist at sky/apple**


Can you tell more about the prefix? Not very clear to me. Is it issuer id or user id? And if it's later and is user provided, how it helps with scanning


It's the issuers company or product name. GitHub for example prefix their tokens with "gh_" which makes it easy to scan for tokens uploaded to repos or even across the web if they wanted.

GitHub has secret scanning features built in now, with a prefix of your product name or company name you could easily create a regex of sorts to find when someone has uploaded an API key for your application to their GitHub repo and revoke the token or email them.


Don’t use base58. Even bitcoin, which originated or at least popularized base58 has moved away from it. They are a pain to encode and decode, with a naive implementation requiring a bignum library, and they don’t save you much in the end over a base32.


But you don't need to encode or decode anything here? Tokens are just random strings?


How do you think the token is generated if not through an encoding process?

Other disadvantage I forgot to mention earlier: base58 is variable length, which is a foot gun that will bite you eventually.


Using base58 here strikes me as simply unnecessary. You could just use the base58 alphabet and directly generate random strings in that alphabet, easily and cheaply. The only thing you loose is the ability to decode into a bit shorter binary string for storage - almost certainly not worth it.


If you are prefixing it, why not prefix it with a full domain name? mycompany_com_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG is not much longer than mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG and allows reporting leaked secrets (for example using the GitHub scheme [1]) without having to look up the URL in some sort of registry, e.g. POST https://mycompany.com/.well-known/report-leaked-secrets

[1]: https://docs.github.com/en/developers/overview/secret-scanni...

edit: See also the discussion from last time: https://news.ycombinator.com/item?id=28296864


I think this is an excellent idea! However, we didn't like the aesthetic of the version that included the domain- I'll admit it's not great reasoning.


I don't follow what the security property is of involving sha256 against the b58 encoded value? If it serves as a checksum against tpyoss, wouldn't one wish to include the shortToken also? Otherwise, it's just as much attacker controlled content as the b58 version is, as best I can tell

As an implementation observation, I am also on a life-long campaign to rid the world of `split(...)[-1]` type manipulations, because they lack the context of a more rigorous "parsing" style. In this specific case, it allows attackers to smuggle almost arbitrary characters between the shortToken and longToken:

    checkAPIKey("alpha_BRTRKFsL_and this one time at band camp\r\nContent-Length: 0\r\n_51FwqftsmMDHHbJAMEXXHCgG",
    "d70d981d87b449c107327c2a2afbf00d4b58070d6ba571aac35d7ea3e7c79f37")


Also for your consideration, the code currently has duplication:

    export const extractLongTokenHash = (token: string) =>
      hashLongToken(extractLongToken(token))
    // ...snip...
      longTokenHash: hashLongToken(extractLongToken(token)),
    // ...snip...
    ) => hashLongToken(extractLongToken(token)) === expectedLongTokenHash
and my experience is that it's so easy to remember to update one and forget to update the others


Thanks, yes I agree the ".split(_)[-1]" is ugly and there should be validation on the key prior to operations on it!

Making an issue :)


Hey, cool idea. I like it. I made a Go implementation for fun/practice - https://github.com/joemiller/prefixed-api-key


I created a similar project recently! Main difference is that I use base64 to encode unique Snowflake IDs, which are timestamped: https://github.com/hopinc/pika :)


In your example, the token contains the timestamp.

With prefixed-api-key, the hash of each token is stored in the database, so the timestamp can easily be added there.

Most of the time I don't see the utility in the client having the timestamp, outside of a scenario where you have a third party validate on their own (e. g. JWT w/ RSA keys). The best way you see if a token has expired is by trying to use it.


Yeah, pika is more of an ID system vs. prefixed-api-key which seems to be more oriented around just API keys, which makes sense. However, advantages of timestamps in IDs are that you can create unique IDs based on the current timestamp; bits which, in other ID systems, are usually just random - which feels like a waste imo. Also, knowing when a resource was created when debugging is very helpful.


I've used MongoDB quite a bit and I don't like that it has the timestamp in the id. That's unnecessary information.

With UUIDs I prefer UUID4 to UUID1 most of the time.

I prefer to only include the relevant information and for API keys the client doesn't usually need the time the key was created.


Of course if you don't like the id's that MongoDB generates by default you can always the supply your own. The only constraint on the _id field is that it be unique as we automatically apply a unique index. Effectively if you have a unique key for your data you should use _id for it as it saves you an index. (We always index _id).

(I work for MongoDB).


I don't agree. Your examples (UUID1, UUID4) are much longer strings and contain no useful information. UUID1 contains the device's MAC ID, UUID4 is just random bits.. vs a functional ID system like pika or Snowflakes which make use of those random bits by embedding a timestamp - something which might actually come in useful for some.


> we search our database for short_token and hash(long_token).

Sounds like some salt should be used here. Search the DB for short_token, then use the discovered salt to check the "password".


Nice work! May I inquire as to the design decision behind storing a hash of the long token server-side for verification of the long token, versus digitally signing the short token plus a nonce and including this signature in the API key (instead of a long token), as some other API key schemes do?

Both are valid approaches of course, I'm just interested to hear your thoughts on the relative tradeoffs.


The signature approach is interesting because it couples the short token the api key, so they're a pair rather than totally independent. Introducing a signature changes the math to create the shortest possible API key with at least 128 bits of entropy (that are not stored server-side). But it's worth looking into!


  // Store the key.longTokenHash and key.shortToken in your database and give
  // api.token to your customer.
I'm pretty sure that should be "key.token" instead.


thank you!


I wasn’t too sure based on the docs:

What parts of tokens does the server store in the Database (everything but the raw long token I think?)

What is sent to the consumer, and what do they need to send with the request?


Yep you've got it! The consumer gets the full api key containing the prefix, short token and long token, e.g. "mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG". The database stores the short token, and a hash of the long token.


Why base 58? I've never even heard of base 58.


from the RFC:

> Base58 is designed with a number of usability characteristics in mind that Base64 does not consider. First, similar looking letters are omitted such as 0 (zero), O (capital o), I (capital i) and l (lower case L). Doing so eliminates the possibility of a human being mistaking similar characters for the wrong character. Second, the non-alphanumeric characters + (plus), = (equals), and / (slash) are omitted to make it possible to use Base58 values in all modern file systems and URL schemes without the need for further system-specific encoding schemes. Third, by using only alphanumeric characters, easy double-click or double tap selection is possible in modern computer interfaces. Fourth, social messaging systems do not line break on alphanumeric strings making it easier to e-mail or message Base58 values when debugging systems.


The linked README links a draft RFC [1], the introduction section of which contains an explanation.

[1] https://datatracker.ietf.org/doc/html/draft-msporny-base58#s...


Wow how did I never hear about this? I only read the first 2 paragraphs under Introduction, but it addresses all the qualms I have with base64 and its many variants.


The RFC doesn't mention an important drawback of base58, which is that the computational cost of encoding or decoding it is quadratic in the length of the input -- unlike base64, which is linear-time.

I assume the performance is acceptable when you're dealing with very small API tokens, but it would be totally unsuitable as a replacement for base64 in the context of, say, email attachments. Even using it for something like a PKI certificate would probably be asking for trouble.


I’m confused by this. Maybe I misunderstand the algorithm given in the draft linked above, but that doesn’t look O(n^2) at all. Intuitively it also doesn’t make sense as quadratic somehow implies that each character depends on all others (so for each of n chars you’d have do do something with all other n chars -> n^2)


It’s basically a naive algorithm of converting a number in base 256 (each byte is a digit) to base 58. The algorithm is super-linear and worst case quadratic due to all the carrying. Maybe average case quadratic too? Not sure.


Oops, I think my previous comment was inaccurate, and it's too late for me to edit it. It's only decoding that's quadratic, not encoding. Sorry for the confusion.

It's not really obvious (IMO) from the wording of the specification, but step 2 of the decoding algorithm is actually a nested loop.


Still not seeing it. As it's an encoding, not a compression, it cannot produce a large amount of output. At most I guess ceil(log2(256)/log2(58)). So encoding will produce a slightly larger output, while decoding will produce a slightly shorter output. Still all linear.


The output size is linear, but since each byte of output requires O(n) work to compute, the total time to decode an encoded string is O(n^2). Or to put it another way, the decoding throughput (in bytes per second) is inversely proportional to the total length of the string.


Odd. Certainly different to how I thought it works (equivalent to base64, just with divmod instead of bit-shifting). Thanks for fixing my intuition. Related: https://cs.stackexchange.com/q/21736


I usually just use base62 (ASCII letters/numbers). Visual inspection/comparison of a long random value seems unlikely, and when you get into 10+ random gibberish characters, you're just as likely to gloss over a transpositional error


Base62 is a great alternative! The libraries/alphabet around base58 are fairly common thanks to cryptocurrencies, but base62 libraries are generally available and easy to produce if needed.


Pretty sure that’s what they use for bitcoin addresses.


Very cool,

I use something similar but a fixed length for the prefix and uppercase

Basically a copy of stripe tokens




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: