this post was submitted on 27 Jul 2024
222 points (96.6% liked)
Technology
59429 readers
3110 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The article is really not clear. Is it saying if a project is forked, then the original is made private, the fork can access data from the private fork?
Is this saying people misunderstand git and think committing a deletion makes people unable to access the previous version? Or is it saying the sharing between public and private repos can expose keys in private repos?
If you accidentally commit an API key into a public repository... you need to roll that key. Even if it was deleted completely, someone still could have accessed it while it was there.
from their actual report
I'm still not sure that answers it. If I fork a project, and the upstream project commits an API key (after I've forked it), then they delete the commit, does this commit stay available to me (unexpected behaviour)? Or is it only if I sync that commit into my repo while it's in the upstream repo (expected behaviour)?
Or is it talking about this from a comment here:
Someone replies and said by having garbage collection kick in it removes this unconnected commit, but it's not clear to me whether this works for github or just the local git repo.
Perhaps the issue is that these commits are synced into upstream/downstream repos when synced when they should not be?
Like I said, I'm really confused about the specifics of this.
I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.
The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:
I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.
Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I'm sure many others.
I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn't bother doing. There's also a reflog that can let you recover lost commits if you make a mistake.
In my experience with GitHub, dropped commits remain indefinitely accessible. I use this to my advantage on pull requests with lots of good commit context that I don't want totally lost in a squash: by copying result of
git log --oneline main...
into the PR body. The SHAs remain accessible even after I force push my branch down to a single commit.I think there is a theoretical limit to how long these commits remain accessible, but I haven't ever hit it in my daily usage.
Ah thanks, this explains it a bit more.