开发者

How to Tag a single file in GIT

I am very new to git. I am currently trying to get familiarized with it by using it to track changes in some excel files that I am maintaining to track some continuous activity I am involved with. All the files are in a single reposito开发者_如何学编程ry. I want to tag each file separately with their versions. Is this possible? So far what I found was the capability to tag the entire repository. If what I am trying to do is wrong, do advise me on the best practice.

Thanks in advance.

Edit

When I was doing this I deliberately deleted a previous tag to make the entire repository be tagged( as I did not find a way to tag single files) as v1.0. As I now want to reset the name of the tag with the file name and quite comfortable with the way things should happen, how can I rollback the deletion and rename the previous tag (the deleted tag)?


Technically you can tag the content of a single file without it's file name. But such tags are of limited use. Tags are expected to point to commits, and special tags to non-commits have very different behavior (you can't git checkout such a special tag). So I strongly suggest to never use non-commit tags.

When you want only some files to be tagged, it might be better to use a separate repo for them, or at least different branches, since git always looks at the full tree for it's operations.

If you still insist to create such a special tag you do:

> git ls-tree HEAD
040000 tree 2c186ad49fa24695512df5e41cb5e6f2d33c119b    bar
100644 blob 409940768f2a684935a7d15a29f96e82c487f439    foo.txt

> git tag my-bar-tree 2c186ad49fa24695512df5e41cb5e6f2d33c119b
> git tag my-foo-file 409940768f2a684935a7d15a29f96e82c487f439


You can only tag a commit, e.g. a certain snap shot in the history of your repository. However, git stores the files as blobs and what you can do is to use git notes to add a note to a blob, like this:

$ git ls-tree HEAD
100644 blob 252f7c7df5bd181536a6c9d0f9c371ce1a5dd042    .gitignore
100644 blob 8150ada74aba86c983ac3f8f63ab26aaa76fdcb7    README
100644 blob c4b1ff6dcb2a8e50727df21ced8d2872cd91af79    TODO.txt

$ git notes add -m "Adding a note to TODO" c4b1ff6dcb2a8e507
$ git notes show c4b1ff6dcb2a8e507
Adding a note to TODO

However, note that this note (no pun intended :) )is only attached to this blob, so whenever the file changes a new blob will be created (with another sha1 hash value) and the new blob will not have this note.


Tags are placed on specific commits as opposed to the repository as a whole as you suggest in your question.

One way of doing what you are suggesting is making sure you only commit changes to a single file with each commit, then you can tag that commit with eg. file1-v1.0 to represent v1.0 of file1.

What are you trying to represent by tagging these commits though? That will influence any advice on how to better your process.


You should really explore having separate branches that track changes to specific files. You should be able to work with that.


Not directly, no.

A tag is a pointer to a specific revision of the repository. The individual files do not themselves have versions separate from that of the repository. If you want separate versions, your options are to have a separate repository for each file, have a separate branch for each file, or to use a different tool that is file oriented (such as RCS -- though it lacks many of the nice features git has).

If the files are at all related, you generally do want to tag a specific version of the group of them. If they're not, you can still tag the entire group with the version of each file changed in that revision. Restricting changes to one file per revision can make this process easier to manage.


This is not an answer to the original question, but instead an answer to one of the comments on that original question

git tags commits. Not files. Can you tell us what you want to do with these "tags"? I think a more appropriate answer would be possible then. – Noufal Ibrahim

I am posting this extended comment as a pseudo-answer

  • because stackoverflow's comments have limited formatting.
  • and because I don't feel it is appropriate for me to add my explanation of why I want single file or subset tagging to the OP question, in case it is not the same

BRIEF ANSWER:

When I do a diff between tag-for-code.txt--update-by-jim-April1st and tag-for-code.txt--original-version-by-joe I only want to see diffs for my-lib/import/new-module/code.txt. Or perhaps my-lib/import/new-module. I don't want to see diffs for my-lib/import/module1,
which is supposedly completely independent[*] of *my-lib/import/new-module/code.txt. No, I don't want to have to know what parts I should filter out. I may not know that without digging deeper.

I understand that git tags are for commits, and commits are essentially snapshots of the entire repository. So I'm just asking for a workaround that would allow me to have the convenience of saying ... diff tag1 tag2 refer only to the subset of files that are explicitly identified as belonging to tag1 and tag2, and not all a whole slew of other independent files that happened to have been changed between.

E.g. perhaps I should have a hypothetical subset tag create a file containing a list of filenames and the repo commit ID. So then tools that use such subset tags would filter only information relevant to the list of filenames in their respective tag-files. Or perhaps just blob-IDs. Whatever.

Has anyone got a BKM for doing this?

===

I am posting this as a pseudo-answer because I have long suffered intellectual pain because of this lack of single file or subset tagging in git. Surely there is a git equivalent? Although I suspect that there is not because of the amazing number of off target responses, often of the form "why do you want to do that?"

Some other version control systems have both entire repo and single file or subset tagging. AFAICT git does not. Without loss of generality I will provide an example from CVS, although these concepts apply to DVCS just as well.

Brief summary of why you want single file and subset group of file tagging as well as whole repo tagging

We all agree that symbolic tag games are good. Right?

In the bad old days there were only single file tags. Although usually you can apply the same tag to groups of files, it was not guaranteed that you apply the same tag to the entire repository.

So if you did checkout -rTestsRunTag expecting to be able to build and run the tests successfully it might fail, because some file that you did not know you depended on was not tagged with TestsRunTag.

Hence the preference for whole repository tags. Tags that apply to a snapshot of the whole repository. Hopefully, if you check out such a whole repository tag, you're guaranteed to be able to build successfully. Right? ...

Actually not right. Did you put your build tools, compiler, etc. in your repository?

Nevertheless, whole repository tagging was a very good step towards reproducible builds and tests.

nevertheless^2, VCSes that's apply only whole repository tagging throw the baby out with the bathwater.

There is still a need for single file tagging. More often, group of file tagging where the group is not the entire repo. Typically a directory subtree, or if you related such things.

Basically, you need such subset tagging when the tag is relevant only to a subset. When the tag is irrelevant and even confusing to the entire repo.

Or when using a whole repo tag is inconvenient.

As I try to explain in the example below when I do a diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe I only want to see diffs for my-lib/import/new-module/code.txt. Or perhaps my-lib/import/new-module. but I don't want to see diffs for my-lib/import/module1, which is supposedly completely independent of *my-lib/import/new-module/code.txt.

Brief(?) example of wanting non-full repo tagging

Here's a brief example, that has just prompted me to search for and post this.

I have a library of mostly independent things. Call this my-lib

I have collected many modules from around the web that I put in this library, in places like my-lib/import/module1, my-lib/import/module2.
Keeping them separate from each other and from my own stuff like my-lib/my-stuff-my-module1, and so on.

I am adding a new module to this library, that I have imported off some website. Let's call that my-lib/import/new-module

Unfortunately that module does not have its own version control system. It was posted on a discussion thread, with slightly different versions by different users. I'm not quite sure which version I want use so I'm going to put a few of them in my library

WLOG let me just talk about a single file in the module my-lib/import/new-module/code.txt

So I download first version I find on the discussion thread. Place it into my-lib/import/new-module/code.txt. Check it in. I would like to give it a symbolic name, since that is nicer than using either git hashes or numeric version numbers like CVS's 1.1.1.1. How about perhaps symbolic "tag" name code.txt--original-version-by-joe although I'm just as likely to put some dates like original creation date in the tag name as well as an the file commit message, and ideally more comments associated with a description of the tag. and I don't need to have the filename in the tag, it's just an example.

Perhaps I use this for a while. But eventually I see a different version on the discussion thread. Next I download first version I find on the discussion thread. Place it into my-lib/import/new-module/code.txt. Check it in. I would like to give it a different symbolic name. How about code.txt--update-by-jim-April1st.

I hope that this is sufficient to show why I want tags that apply to single files, or subsets of files, and not to the entire repository.

The tags code.txt--original-version-by-joe and code.txt--update-by-jim-April1st are relevant only to the module my-lib/import/new-module and its file my-lib/import/new-module/code.txt. These new-module tags are irrelevant to other modules of which it is completely independent, such as my-lib/import/module1, my-lib/import/module2, my-lib/my-stuff-my-module1.

When I do a diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe I only want to see diffs for my-lib/import/new-module/code.txt. Or perhaps my-lib/import/new-module. but I don't want to see diffs for my-lib/import/module1, which is supposedly completely independent[*] of *my-lib/import/new-module/code.txt.

Note: There is no such thing as completely independent, but...

Note that I said "supposedly completely independent". That's the gotcha. Even completely independent library modules may break the library infrastructure for crosscutting stuff like makefiles, even if they are never linked into the same program.

But nevertheless, it is very convenient to have the default diff between code.txt--update-by-jim-April1st and code.txt--original-version-by-joe only show diffs for my-lib/import/new-module/code.txt. and not for Supposedly completely independent my-lib/import/module1.

It is also convenient have ways of saying "diff against the snapshot of the entire repository at the time that the single file tag code.txt--update-by-jim-April1st was created. That's unambiguous if there's only a single file tag.

Minor issues arise if that same tag is applied to versions of multiple files That do not reside in a single entire repository snapshot (e.g. git commit). but you can deal with that.

Why not use modules?

I can hear some idiots, ahem, people of less imagination say "why not just use modules?" Talking not about the generic programming division of the program into modules and submodules, but modules and submodules specifically in the version control system, like git modules and submodules.

OK, I just used modules in the above example to simplify the discussion.

Modules as defined by the version control system have some overhead. You have to set things up a priori. That is frequently not good enough.

Many systems start off as a single file in some big parent repository when they are not considered to be a separate module. and then evolve into being a separate moduke. Heck, they often start as a single function in a big_file_of_many_functions.c You then realize that this function and/or some near relatives should be in a file by themselves, like foo.c, and if you're in C/C++ nearly always need a header file foo.h. eventually you realize that it would be really better if there is a separate directory foo/ containing foo/foo.c and foo/foo.h. And eventually you may add foo/Makefile, foo/tests/test1.py

and somewhere along this evolution from function within a bigger file of other functions to a subtree you decide to give it a version control system modular submodule.

That's great. But it sure would be nice to have ways of referring (a) to this set of related things within the bigger repository (b) and two versions of this set of related things.

I say again:

Modules as defined by the version control system have some overhead. Considerable overhead.

It is not just me saying this. This is very much related to the debate about mono repo versus multi-repo, e.g. https://johnclarke73.medium.com/mono-or-multi-repo-6c3674142dfc. Some very important software developers and companies use mino-repos. IMHO in part because git and other VCS module systems are a hassle.

Consider a tree where every subdirectory tree is a separate module

I have long maintained a personal library. It is a directory tree, deeply nested. Nearly every subdirectory tree can be treated as an independent module. Often individual files can be exported individually, e.g header-only librares in C or C++. usually I prefer to have a directory for each minimal logical subsystem, so that you can have separate makefiles and test scripts.

I often share this code with other projects, companies, employers...

These other projects seldom want to import my entire personal library tree. Hence I want to be able to checkout Just a subset, ranging from a single file, most often a subdirectory tree but sometimes a set of subdirectory trees that are required to work together.

These other projects often do not want to use my version control system. Too bad, that's life. But sometimes they do.

Of course, modern DVCS's have very poor support for checking out arbitrary subsets. Short of module support (see elsewhere). It is not good when a sparse checkout, sometimes called a sparse branch (although terminology quickly gets into the weeds of particular version control systems) carries history related to items that are not part of what was checked out. or if not that, if that sparse branch that is not carrying excess history and access objects cannot be merged back into a repository that has more objects in different parts the file system that were not checked out, and for history.

When they are willing to share version control systems, it is really confusing to have the items that they have checked out be tagged with tags that are absolutely irrelevant to them.

Sometimes not just confusing. Sometimes a security hole.

Anyway, in this great big personal monorepo I like to tag the subsets checked out by other projects, and imported back from those other projects. But, again, those tags are irrelevant to the stuff that those other projects don't want to look at.

e.g. CVS tags = per-file but easy to do multiple file

CVS tags actually are defined on a per file basis. But it is common, and CVS makes it easy, to apply the same tag through multiple files, to a subtree of the repo, or to the entire repo. e.g. if you just say cvs tag tagname in a particular directory, CVS applies the tag to all files in that directory and its subdirectories. If you say cvs tag tagname at the top of your CVS repository, the tag is applied to your entire repository. Many CVS users have an alias or command that allows you to tag the entire repository even if you are in a subdirectory. but you can also specify a single file cvs tag single-file, or multiple independent files and subdirectories cvs tag foo/file1-only bar/file2-only bazz-tree

Of course the CVS tags are not guaranteed to be consistent across the repository the way git tags are.

But it's a usage model question. Sometimes you want guaranteed cross repository consistent tags. Sometimes you don't. Often one uses naming conventions to differentiate the two. And tools to determine if those naming conventions are correctly applied, e.g. if for example a supposedly whole repository tag has not been applied to everything.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜