looking up svn content by hash
Content in the svn repository is uniquely identified using two pieces of information:
- repository path
- revision number
I am looking开发者_Go百科 for a way to recover that information from a fixed-length message (say, 8 or 16 bytes). It is not enough to identify content in the repository from our fixed-length message by just storing the revision number. The path is variable length, and cannot fit in the message.
However, I was wondering if svn path+revision pairs can be accessed by hash, like how Git does it. Is there a mechanism for this already built into svn?
It would suffice if the path alone were accessible by hash, then I could store the revision number independently in the fixed-length message.
Would I have to keep an external database of used paths and their hashes, or does SVN provide a fast way to list all paths extant across all revisions that I can query on-demand?
Edit: This is practically the same question, but is inconclusive: SVN: translation between path and node ids?
SVN doesn't store files, it stores file systems. As such, the revision is used to access the correct revision of the file system, and then a portion of the path is used to access the file in question.
Internally SVN revisions inodes, with their own respective node ids. However, such "direct to the inode" access is typically not supported, as an inode lacks certain information that is generally necessary (like the file's name, owner, group, permissions, etc.).
Git on the other hand stores files, so it makes sense to find a better file id than the file name (which might stay the same for multiple revisions of the file), so Git uses a hash of the file's contents. Being file oriented, it's not uncommon to pull the file using its id (the hash).
Unfortunately, there's not an equivalent of pulling a file system by hash, because the hash's inputs would have to be based on the contents of the inode on a per-version of the inode basis. That would mean a way of hashing a tree's contents, which would be possible. Such a system would provide fast access to a particular historical version of a inode.
Probably the main reason it wasn't done this way is that fast client access of the inode isn't much of a concern in SVN. The SVN server already has the pointers and data structure to access the inodes on the server side, and it has knowledge of the remote repository's filesystem as transmitted by the client. This allows SVN to transmit the differences in the file systems to the client (not a full copy of the file system). Without a need to consistently pull full file systems, fast path access to a full file system pull isn't a priority.
精彩评论