I wonder if you could make a luhn-like check that would require an additional ap...

jewel · on July 5, 2022

If vendors agreed to a common prefix on all secret key values then it'd be easy for everyone to add checks, to everything. Something like "_SECRET88_".

Of course, then your secret key checker would need to build that string by concatenating so that it wouldn't set off itself.

zricethezav · on July 5, 2022

More and more providers have been adding unique prefixes to their tokens and access keys which makes detection much easier. Ex, GitLab adds `glpat-` to their PAT.

A project I maintain, Gitleaks, can easily detect "unique" secrets and does a pretty good job at detecting "generic" secrets too. In this case, the generic gitleaks rule would have caught the secrets [1]. You can see the full rule definition here [2] and how the rule is constructed here [3].

[1] https://regex101.com/r/CLg9TK/1

[2] https://github.com/zricethezav/gitleaks/blob/master/config/g...

[3] https://github.com/zricethezav/gitleaks/blob/master/cmd/gene...

remram · on July 6, 2022

RFC 8959 registered the 'secret-token:' prefix / URI scheme.

https://www.rfc-editor.org/rfc/rfc8959.html

pitched · on July 5, 2022

How about scanning for any string with high entropy? Might be easier to get buy-in if we don’t all have to bike-shed over what the prefix is.

CoffeeOnWrite · on July 5, 2022

That’s helpful but the token prefixes are also helpful. You might be interested in GitHub’s reasoning at https://github.blog/2021-04-05-behind-githubs-new-authentica...

segudev · on July 6, 2022

Unfortunately, it's not as simple as that. Lots of secrets are "generic" (think of a DB user/password combination), meaning that you need to take into account the surrounding source code context to be able to determine if they are a "real" secret.

Here is a full explanation if you are interested: https://blog.gitguardian.com/why-detecting-generic-credentia...

bilekas · on July 5, 2022

I was thinking about that too, but it's actually tricky, even the example given, they use the var `accessId` but you could filter for all that, even the standard ones, but you couldn't have enough confidence in it so that if someone did post with a typo or even a random var name, they would think "Okay, no warning so must be okay".

Something like giving false confidence to the user. Not the best idea.