This highlights why it's so important that any secret that gets committed must be rotated. Simply removing it from the git history isn't enough, because it can still linger, it's just harder to find.
Full disclosure, I work for GitHub, but push protection from Secret Scanning is awesome for this because your nearly leaked secret doesn’t make it to the remote, and it gives you instructions on how to fix your local repo!
Why does GitHub provide no way for a repository administrator to self-service a git gc? I seem to recall reading a blog post that suggested GitHub had invested a bunch of engineering resource in making cleaning up unreachable objects much more scalable.
And I think it answers the product vision for it well (why it’s automatic):
> We have used this idea at GitHub with great success, and now treat garbage collection as a hands-off process from start to finish.
GitHub also provides these docs for what to do if there is sensitive data in your repo, which is quite involved and (given the huge amount of knowledge internally of both GitHub internals and git internals), I would trust their advice:
If you feel strongly that a feature you need is missing, by adding your voice, you increase visibility of the request. I think GitHub does offer solutions to this problem though, including eventual GC automatically.
I noticed long ago that unreferenced commits survive on GitHub for long, but I couldn't find a way to discover them.
I know that GitHub stores together the objects of many repositories, but they should have implemented and offered a way to gc them when they came up with that optimization.
Sure, there would still be the chance that someone already obtained the objects by the time you gc them, but it's a much lesser risk then leaving them there indefinitely (and they could provide a log of the last fetches to better assess the impact of the erroneous push).
> chance that someone already obtained the objects by the time you gc them
I was under the impression that there are various 'mirror github' projects that listen to the GitHub change event API and immediately crawl some/all commits.
We turned that on about a year ago, and that totally helped reduce the silly. The new dashboards are nice to - letting you spot what application team needs a phone call. 'This is still active' warning is fantastic. Wish all providers would give you the API to show that.
This is a useful feature but can only provide a degree of protection.
To a certain extent, your approach of considering any mistakenly pushed commit as public is laudable, but it still seems unreasonable to me to not provide an analogue to gc
It is still in your local repository, but it's not pushed to the remote repository. So a forensics on your local machine may reveal it (probably until you do git gc, but I'm not an expert on git forensics) but it's safe otherwise.
That's only client side, and you also need to "gc" it to get rid of it, or it will still be in .git/objects and can be retrieved via something like `git cat-file`.
We're getting a little off-topic, but even git-add will put it in the object store without even committing! I once saved my boss's bacon with that. He had git-added a presentation file he'd been working on to commit it, but accidentally nuked his changes with "git-reset --hard" before comitting. He mentioned his mistake in chat, and we were able to recover the lost object by sorting files in the object directory by last-modified and cat-filing it back out by that ID. He bought me a beer for that after work that day. Good times.
Read gitcore-tutorial(7), folks. You too might save someone's bacon, some day.