Show HN: Prefixed, dual-token, base58 encoded API Keys

benatkin · on May 11, 2022

I like it. I'm not sure that "short token" is the right term for that - perhaps "token id". Also the only occurrence of "it's" in the README should be "its": https://brians.wsu.edu/2016/05/19/its-its/

Edit: Actually, the short token is more than just a token ID - it's also required, right? Perhaps replace:

   When we receive an incoming request, we search our database for hash(long_token)

with:

   When we receive an incoming request, we search our database for hash(long_token) and short_token. A token can be blocklisted by its short_token.

So I think I prefer "short token" to "token id" but perhaps there is a better name for it. If it did get renamed it would probably make sense to rename "long token" as well. I'll defer to experts on this.

seveibar · on May 11, 2022

fixed thank you :) - and great suggestion!

edit based on your edit: Added your recommendation to the README, thank you!

benatkin · on May 11, 2022

My suggestion was blocklist which looks a lot like blacklist because only one letter is different. Perhaps "deny list" would be better. It's what Apple is using. https://developer.apple.com/news/?id=1o9zxsxl

seveibar · on May 11, 2022

my mistake! fixed :)

benatkin · on May 11, 2022

One more suggestion - replace

> A token can be blocklisted by its short token.

With:

> An API Key can be blocklisted by its short token.

seveibar · on May 10, 2022

Author here: I was curious what HN would think of this token style, we're currently using it for getseam.com . I think it has a lot of advantages over other token styles (e.g. double-click to select, standard alphabet, compatible with secret scanning). Although this is a typescript library, we originally designed this API key for use in Ruby on Rails and the pattern should be fairly portable.

jffry · on May 11, 2022

Thanks for sharing! As somebody who has been bitten before by copy-pasting a key I appreciate that as the headlining benefit :)

Thoughts from an ergonomic perspective:

- Base58 is nice and makes easy-to-handle values which is nice

- Naming of shortToken, longToken, and token are confusing to me, because it makes it sound like all of these values are tokens/secrets of some sort, rather than components of the token. Not to bikeshed, but what jumps to my mind is more like "prefix_id_secret", which helps make it clear what role each one plays.

- I don't think keyPrefix should have a default, I think your function should raise an error if it's not provided. Otherwise more than one user of your library is going to end up with "mycompany" keys in the wild.

- You probably ought to validate that keyPrefix does not contain an underscore and raise an exception if it does, otherwise that way lies pain for people trying to handle these keys.

Thoughts from a security perspective:

- You should probably use crypto.timingSafeEquals(buffer, buffer) [1] instead of comparing strings

- If you're trying to store the secret's hash, isn't something like bcrypt/scrypt preferred over raw sha256 these days?

[1] https://nodejs.org/api/crypto.html#cryptotimingsafeequala-b

koolba · on May 11, 2022

> If you're trying to store the secret's hash, isn't something like bcrypt/scrypt preferred over raw sha256 these days?

Not for this type of use case. If a secret is long and generated from a cryptographically secure source (eg /dev/urandom), then any cryptographic hash function is fine as brute forcing the secret itself is not feasible. A single pass of sha256 is fine and presumably you’d be doing many of these operations each second in a high throughput application.

“Slow” hash functions like bcrypt or scrypt are for user provided secrets that might not have a large amount of entropy. It’s fine for something like user authentication but would be way too slow and pointless for API keys.

benatkin · on May 11, 2022

> If a secret is long

It seems this one, at 24 base-58 characters, is only roughly twice as long as it needs to be, right? How short is too short?

seveibar · on May 11, 2022

Appreciate the thoughts on the naming/exceptions and thanks for taking a look at the implementation! Definitely making some adjustments tonight.

I think your thought on bcrypt/scrypt vs SHA256 is super interesting here. The long token is treated a lot like a password, so we should treat it similarly and use slow hashing. However, unlike a password an API key is repeatedly used for authentication instead of being exchanged for a session token. I don't think this meaningfully changes anything- so I think you're right that bcrypt/scrypt would be a better choice!

Edit: Also see koomla's answer!

benatkin · on May 11, 2022

What meaningfully changes the requirements is that takes many more attempts to brute force an API key because they're longer.

I'd consider using a weaker bcrypt/scrypt/argon2 for API keys than would be used for login. Perhaps one that takes a hundredth or a thousandth of the time.

It could be unnecessary though.

Here's the main scenario: someone snagged a hashed long token from a database backup and wants to get the unhashed long token so they can use it to access something behind the API. They can do all the brute forcing they want and the server owner will never know about it. There are 1 with 42 zeroes worth of potential long tokens to try (58 * '51FwqftsmMDHHbJAMEXXHCgG'.length). Seems unlikely even though it's very very cheap to hash a potential long token. The tokens this is using are pretty short, but still not short enough to make cracking it feasible.

If you used argon2 maybe you could cut mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG down to mycompany_BRTRKFsL_51FwqftsmMDH. Shorter token!

I would perhaps make the long token 58 digits long just because it would have more than a googol (10^100) possible values but still be shorter than 80 characters with the prefix and short token, and maybe swap the SHA hashing for a low cost (memory and CPU) argon2.

If you wanted to have really short API keys you could get creative with argon2. This is relevant for magic links.

https://startdebugging.net/2013/10/counting-up-to-one-trilli...

koolba · on May 11, 2022

Couple comments upon looking at the actual code:

- You can promisify randomBytes once and reuse it rather than twice for every invocation

- There shouldn’t be a default value for the company name or people will end up using it.

- The company name isn’t validated so it could contain underscores which would cause issues with the short token parsing as it assumes it’s the second “chunk”

- The equals comparison of the hashes for the secrets is not timing safe. It’s not as bad as if they were plain text but it does short circuit due to how string equals works. Use the actual built in timing safe equals on the Buffer hash (not the stringified hex).

seveibar · on May 11, 2022

thanks for taking a look! I've created an issue on the repo and should be able to address these tonight :)

joshuawarner32 · on May 11, 2022

On iPhone the long-press-to-select doesn’t cross the underscore boundary, so you end up just selecting part of the token. Maybe you can do without those underscores?

__sy__ · on May 11, 2022

**shakes fist at sky/apple**

splix · on May 11, 2022

Can you tell more about the prefix? Not very clear to me. Is it issuer id or user id? And if it's later and is user provided, how it helps with scanning

bluehatbrit · on May 11, 2022

It's the issuers company or product name. GitHub for example prefix their tokens with "gh_" which makes it easy to scan for tokens uploaded to repos or even across the web if they wanted.

GitHub has secret scanning features built in now, with a prefix of your product name or company name you could easily create a regex of sorts to find when someone has uploaded an API key for your application to their GitHub repo and revoke the token or email them.

adastra22 · on May 11, 2022

Don’t use base58. Even bitcoin, which originated or at least popularized base58 has moved away from it. They are a pain to encode and decode, with a naive implementation requiring a bignum library, and they don’t save you much in the end over a base32.

epse · on May 11, 2022

But you don't need to encode or decode anything here? Tokens are just random strings?

adastra22 · on May 11, 2022

How do you think the token is generated if not through an encoding process?

Other disadvantage I forgot to mention earlier: base58 is variable length, which is a foot gun that will bite you eventually.

formerly_proven · on May 11, 2022

Using base58 here strikes me as simply unnecessary. You could just use the base58 alphabet and directly generate random strings in that alphabet, easily and cheaply. The only thing you loose is the ability to decode into a bit shorter binary string for storage - almost certainly not worth it.

remram · on May 11, 2022

If you are prefixing it, why not prefix it with a full domain name? mycompany_com_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG is not much longer than mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG and allows reporting leaked secrets (for example using the GitHub scheme [1]) without having to look up the URL in some sort of registry, e.g. POST https://mycompany.com/.well-known/report-leaked-secrets

[1]: https://docs.github.com/en/developers/overview/secret-scanni...

edit: See also the discussion from last time: https://news.ycombinator.com/item?id=28296864

seveibar · on May 11, 2022

I think this is an excellent idea! However, we didn't like the aesthetic of the version that included the domain- I'll admit it's not great reasoning.

mdaniel · on May 11, 2022

I don't follow what the security property is of involving sha256 against the b58 encoded value? If it serves as a checksum against tpyoss, wouldn't one wish to include the shortToken also? Otherwise, it's just as much attacker controlled content as the b58 version is, as best I can tell

As an implementation observation, I am also on a life-long campaign to rid the world of `split(...)[-1]` type manipulations, because they lack the context of a more rigorous "parsing" style. In this specific case, it allows attackers to smuggle almost arbitrary characters between the shortToken and longToken:

    checkAPIKey("alpha_BRTRKFsL_and this one time at band camp\r\nContent-Length: 0\r\n_51FwqftsmMDHHbJAMEXXHCgG",
    "d70d981d87b449c107327c2a2afbf00d4b58070d6ba571aac35d7ea3e7c79f37")

Also for your consideration, the code currently has duplication:

    export const extractLongTokenHash = (token: string) =>
      hashLongToken(extractLongToken(token))
    // ...snip...
      longTokenHash: hashLongToken(extractLongToken(token)),
    // ...snip...
    ) => hashLongToken(extractLongToken(token)) === expectedLongTokenHash

and my experience is that it's so easy to remember to update one and forget to update the others

seveibar · on May 11, 2022

Thanks, yes I agree the ".split(_)[-1]" is ugly and there should be validation on the key prior to operations on it!

Making an issue :)

miller_joe · on May 12, 2022

Hey, cool idea. I like it. I made a Go implementation for fun/practice - https://github.com/joemiller/prefixed-api-key

phineyes · on May 11, 2022

I created a similar project recently! Main difference is that I use base64 to encode unique Snowflake IDs, which are timestamped: https://github.com/hopinc/pika :)

benatkin · on May 11, 2022

In your example, the token contains the timestamp.

With prefixed-api-key, the hash of each token is stored in the database, so the timestamp can easily be added there.

Most of the time I don't see the utility in the client having the timestamp, outside of a scenario where you have a third party validate on their own (e. g. JWT w/ RSA keys). The best way you see if a token has expired is by trying to use it.

phineyes · on May 11, 2022

Yeah, pika is more of an ID system vs. prefixed-api-key which seems to be more oriented around just API keys, which makes sense. However, advantages of timestamps in IDs are that you can create unique IDs based on the current timestamp; bits which, in other ID systems, are usually just random - which feels like a waste imo. Also, knowing when a resource was created when debugging is very helpful.

benatkin · on May 11, 2022

I've used MongoDB quite a bit and I don't like that it has the timestamp in the id. That's unnecessary information.

With UUIDs I prefer UUID4 to UUID1 most of the time.

I prefer to only include the relevant information and for API keys the client doesn't usually need the time the key was created.

jd_mongodb · on May 12, 2022

Of course if you don't like the id's that MongoDB generates by default you can always the supply your own. The only constraint on the _id field is that it be unique as we automatically apply a unique index. Effectively if you have a unique key for your data you should use _id for it as it saves you an index. (We always index _id).

(I work for MongoDB).

phineyes · on May 11, 2022

I don't agree. Your examples (UUID1, UUID4) are much longer strings and contain no useful information. UUID1 contains the device's MAC ID, UUID4 is just random bits.. vs a functional ID system like pika or Snowflakes which make use of those random bits by embedding a timestamp - something which might actually come in useful for some.

kevincox · on May 11, 2022

> we search our database for short_token and hash(long_token).

Sounds like some salt should be used here. Search the DB for short_token, then use the discovered salt to check the "password".

tetch · on May 11, 2022

Nice work! May I inquire as to the design decision behind storing a hash of the long token server-side for verification of the long token, versus digitally signing the short token plus a nonce and including this signature in the API key (instead of a long token), as some other API key schemes do?

Both are valid approaches of course, I'm just interested to hear your thoughts on the relative tradeoffs.

seveibar · on May 11, 2022

The signature approach is interesting because it couples the short token the api key, so they're a pair rather than totally independent. Introducing a signature changes the math to create the shortest possible API key with at least 128 bits of entropy (that are not stored server-side). But it's worth looking into!

wccrawford · on May 11, 2022

  // Store the key.longTokenHash and key.shortToken in your database and give
  // api.token to your customer.

I'm pretty sure that should be "key.token" instead.

seveibar · on May 11, 2022

thank you!

lefrenchy · on May 11, 2022

I wasn’t too sure based on the docs:

What parts of tokens does the server store in the Database (everything but the raw long token I think?)

What is sent to the consumer, and what do they need to send with the request?

seveibar · on May 11, 2022

Yep you've got it! The consumer gets the full api key containing the prefix, short token and long token, e.g. "mycompany_BRTRKFsL_51FwqftsmMDHHbJAMEXXHCgG". The database stores the short token, and a hash of the long token.

anm89 · on May 11, 2022

Why base 58? I've never even heard of base 58.

__sy__ · on May 11, 2022

from the RFC:

> Base58 is designed with a number of usability characteristics in mind that Base64 does not consider. First, similar looking letters are omitted such as 0 (zero), O (capital o), I (capital i) and l (lower case L). Doing so eliminates the possibility of a human being mistaking similar characters for the wrong character. Second, the non-alphanumeric characters + (plus), = (equals), and / (slash) are omitted to make it possible to use Base58 values in all modern file systems and URL schemes without the need for further system-specific encoding schemes. Third, by using only alphanumeric characters, easy double-click or double tap selection is possible in modern computer interfaces. Fourth, social messaging systems do not line break on alphanumeric strings making it easier to e-mail or message Base58 values when debugging systems.

jffry · on May 11, 2022

The linked README links a draft RFC [1], the introduction section of which contains an explanation.

[1] https://datatracker.ietf.org/doc/html/draft-msporny-base58#s...

NavinF · on May 11, 2022

Wow how did I never hear about this? I only read the first 2 paragraphs under Introduction, but it addresses all the qualms I have with base64 and its many variants.

teraflop · on May 11, 2022

The RFC doesn't mention an important drawback of base58, which is that the computational cost of encoding or decoding it is quadratic in the length of the input -- unlike base64, which is linear-time.

I assume the performance is acceptable when you're dealing with very small API tokens, but it would be totally unsuitable as a replacement for base64 in the context of, say, email attachments. Even using it for something like a PKI certificate would probably be asking for trouble.

dividuum · on May 11, 2022

I’m confused by this. Maybe I misunderstand the algorithm given in the draft linked above, but that doesn’t look O(n^2) at all. Intuitively it also doesn’t make sense as quadratic somehow implies that each character depends on all others (so for each of n chars you’d have do do something with all other n chars -> n^2)

oefrha · on May 11, 2022

It’s basically a naive algorithm of converting a number in base 256 (each byte is a digit) to base 58. The algorithm is super-linear and worst case quadratic due to all the carrying. Maybe average case quadratic too? Not sure.

teraflop · on May 11, 2022

Oops, I think my previous comment was inaccurate, and it's too late for me to edit it. It's only decoding that's quadratic, not encoding. Sorry for the confusion.

It's not really obvious (IMO) from the wording of the specification, but step 2 of the decoding algorithm is actually a nested loop.

dividuum · on May 11, 2022

Still not seeing it. As it's an encoding, not a compression, it cannot produce a large amount of output. At most I guess ceil(log2(256)/log2(58)). So encoding will produce a slightly larger output, while decoding will produce a slightly shorter output. Still all linear.

teraflop · on May 11, 2022

The output size is linear, but since each byte of output requires O(n) work to compute, the total time to decode an encoded string is O(n^2). Or to put it another way, the decoding throughput (in bytes per second) is inversely proportional to the total length of the string.

dividuum · on May 11, 2022

Odd. Certainly different to how I thought it works (equivalent to base64, just with divmod instead of bit-shifting). Thanks for fixing my intuition. Related: https://cs.stackexchange.com/q/21736

nick238 · on May 11, 2022

I usually just use base62 (ASCII letters/numbers). Visual inspection/comparison of a long random value seems unlikely, and when you get into 10+ random gibberish characters, you're just as likely to gloss over a transpositional error

seveibar · on May 11, 2022

Base62 is a great alternative! The libraries/alphabet around base58 are fairly common thanks to cryptocurrencies, but base62 libraries are generally available and easy to produce if needed.

UncleEntity · on May 11, 2022

Pretty sure that’s what they use for bitcoin addresses.

godmode2019 · on May 11, 2022

Very cool,

I use something similar but a fixed length for the prefix and uppercase

Basically a copy of stripe tokens