Redaction in git
I started working on a little Python script for FTP recently. To start off with, I had server, login and password details for an FTP site hardwired in the script, but this didn't开发者_Go百科 matter because I was only working on it locally.
I then had the genius idea of putting the project on github. I realised my mistake soon after, and replaced the hardwired details with a solution involving .netrc
. I've now removed the project from github, as anyone could look at the history, and see the login details in plain text.
The question is, is there any way to go through the git history and remove user name and password throughout, but otherwise leave the history intact? Or do I need to start a new repo with no history?
First of all, you should change the password on the FTP site. The password has already been made public; you can't guarantee that no one has cloned the repo, or it's not in plain-text in a backup somewhere, or something of the sort. If the password is at all valuable, I would consider it compromised by now.
Now, for your question about how to edit history. The git filter-branch
command is intended for this purpose; it will walk through each commit in your repository's history, apply a command to modify it, and then create a new commit.
In particular, you want git filter-branch --tree-filter
. This allows you to edit the contents of the tree (the actual files and directories) for each commit. It will run a command in a directory containing the entire tree, your command may edit files, add new files, delete files, move them, and so on. Git will then create a new commit object with all of the same metadata (commit message, date, and so on) as the previous one, but with the tree as modified by your command, treating new files as adds, missing files as deletes, etc (so, your command does not need to do git add
or git rm
, it just needs to modify the tree).
For your purposes, something like the following should work, with the appropriate regular expression and file name depending on your exact situation:
git filter-branch --tree-filter "sed -i -e 's/SekrtPassWrd/REDACTED/' myscript.py" -- --all
Remember to do this to a copy of your repository, so if something goes wrong, you will still have the original and can start over again. filter-branch
will also save references to your original branches, as original/refs/heads/master
and so on, so you should be able to recover even if you forget to do this; when doing some global modification to my source code history, I like to make sure I have multiple fallbacks in case something goes wrong.
To explain how this works in more detail:
sed -i -e 's/SekrtPassWrd/REDACTED/' myscript.py
This will replace SekrtPassWrd
in your myscript.py
file with REDACTED
; the -i
option to sed
tells it to edit the file in place, with no backup file (as that backup would be picked up by Git as a new file).
If you need to do something more complicated than a single substitution, you can write a script, and just invoke that for your command; just be sure to call it with an absolute pathname, as git filter-branch
call your command from within a temporary directory.
git filter-branch --tree-filter <command> -- --all
This tells git
to run a tree filter, as described above, over every branch in your repository. The -- --all
part tells Git to apply this to all branches; without it, it would only edit the history of the current branch, leaving all of the other branches unchanged (which probably isn't what you want).
See the documentation on GitHub on Removing Sensitive Data (as originally pointed out by MBO) for some more information about dealing with the copies of the information that have been pushed to GitHub. Note that they reiterate my advice to change your password, and provide some tips for dealing with cached copies that GitHub may still have.
Maybe just easier to change your password on the FTP site? Unless you're embarrassed by the code...
I believe you should be able to change all of your commits using the filter-branch
command. See the section in the ProGit book for details.
However, as @MBO's link notes
force-pushing does not erase commits on the remote repo, it simply introduces new ones and moves the branch pointer to point to them
So you'll need to remove the repository completely from GitHub to remove those commits (i.e. even if they're not in your commit history, they're still floating around in the repository)
To add to the chosen answer by @Brian Campbell, I had issues with his code in my use case. I simply was missing the file in question in earlier commits. I am sure I am not the only person in this situation so I made a simple fix/hack
git filter-branch --tree-filter "sed -i -e 's/SekrtPassWrd/REDACTED/' myscript.py || echo 'fail'" -- --all
All I did was add || echo 'fail'
to insure the code would keep running even when the file was not found in the commit. I hope someone else finds this useful or can reply with a better method of handling missing files. I dont have enough rep so I had to make a new answer.
精彩评论