Is it possible to keep an unversioned file in a git repository

2022-12-19 20:22 问答作者：

Here is the problem:

I created bare git repository at my hosting partner place, which I use as the reference repository from all the locations/computers I maintain my project from.

The thing is that my project is using a sqlite db file, which keeps growing regularly (it is about 150MB for now). As time is passing, my .git folder is getting bigger and bigger (lately around 1GB). And my hosting space is limited.

I need the bare repository to contain the HEAD version of this db file but I really do no开发者_运维问答t need to keep its version history.

So, to gain some space, from time to time, I remove the db file from the history, clean the repository and recreate the bare version. This works, but is quite a pain.

Is there a way to tell git to keep only the last version of a file and drop its history?

Short answer: no.

More useful answer: Git doesn't track files individually, so asking it to throw away the history of a single file would mean that it would have to rewrite all of its history completely upon every commit, and that leads to all kinds of ugly problems.

You can store a file in an annotated tag, but that's not very convenient. It basically goes like this:

ID=`git hash-object -w yourfile.sqlite`
git tag -a -m "Tag database file" mytag $ID

In no way does that conveniently update (or even create) the database file in the working tree for you... you'd have to use hook scripts to emulate that.

Full disclosure: I'm not completely sure whether it's actually possible to push tagged blobs that aren't covered by the normal history. I suspect that it isn't, in which case this recipe would be a lot less than useful.

It sounds like you're looking for the solution to the wrong problem.

Large binary files do often need to be stored in repositories, but I don't think a SQLite database is something you would really need to store in its binary form in a repository.

Rather, you should keep the schema in version control, and if you need to keep data too, serialize it (to XML, JSON, YAML...) and version that too. A build script can create the database and unserialize the data into it when necessary.

Because a text-based serialization format can be tracked efficiently by Git, you won't worry about the space overhead of keeping past versions even if you don't think you need access to them.

You can always use .gitignore config file for this - from the beginning.

And ... (from this thread: kudos for Björn Steinbrink!)

Use filter-branch to drop the parents on the first commit you want to keep, and then drop the old cruft.

Let's say $drop is the hash of the latest commit you want to drop. To keep things sane and simple, make sure the first commit you want to keep, ie. the child of $drop, is not a merge commit. Then you can use:
git filter-branch --parent-filter "sed -e 's/-p $drop//'" \ 
    --tag-name-filter cat -- \ 
    --all ^$drop 
The above rewrites the parents of all commits that come "after" $drop.

Check the results with gitk.

Then, to clean out all the old cruft.

First, the backup references from filter-branch:
git for-each-ref --format='%(refname)'refs/original | \ 
    while read ref 
    do 
            git update-ref -d "$ref" 
    done 
Then clean your reflogs:
git reflog expire --expire=0 --all 
And finally, repack and drop all the old unreachable objects: git repack -ad git prune # For objects that repack -ad might have left around

At that point, everything leading up to and including $drop should be gone.

If I understand your question, I think I have a simple solution.

First backup the file somewhere,
Delete it from your working dir/tree. Not git rm, just rm.
Do a commit.
Make sure the file is added to .gitignore.

On subsequent commits, GIT will no longer attempt to add that file. Note that you will still have the file stored in previous commits. It's just that you won't be adding it to every commit you do in the future. In order to delete it from prior commits, you'll need advice from someone with more GIT experience than I have.

Add sqlite.db to your .gitignore.

To check-in the current db for (potential) pushing with the current branch:

branch="$(sed 's,.*refs/heads/,,' "$(git rev-parse --git-dir)"/HEAD)"
objectname=$(git hash_object -w "$(git rev-parse --show-toplevel)/sqlite.db")
git tag -f db_heads/$branch $objectname

when pushing a branch:

git push origin $branch +db_heads/$branch

When fetching a branch:

git fetch origin $branch tags/db_heads/$branch:tags/db_heads/$branch

when checking out a branch:

git checkout $branch
git cat-file -p db_heads/$branch >"$(git rev-parse --show_toplevel)/sqlite.db"

And that should do it, I think.

继续阅读：git

Is it possible to keep an unversioned file in a git repository

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？