Linux filesystem million symlinks vs million files
I'm working on a Linux filesystem-based caching system for a web application to be used as a last resort when APC and Memcache are unavailable. The system will cache between 500,000 and 1,000,000 unique strings identifiers, each with a large value. I'm taking the MD5 hash of the string ID and based on the first few chars, creating subfolders so not too many files end up in any one directory.
I know this concepts works because I'm using it in a similar application.
Although there are up to 1MM string IDs, they all point one of only 18,000 unique values, so, for instance there might be 100,000 string IDs that all point to the same value. Right now this means there are 100,000 files with different filenames containing the same content which is bad for the underlying filesystem cache.
Is there any disadvantage to caching the 18,000 unique values, then for every unique string ID, creating a symlink to the unique value file? This way the filesystem buffer can cach开发者_如何学Goe the 18,000 files and the descriptors for the symlinks.
I'm just concerned about having 1,000,000 symlinks and any potential problems this may introduce.
Thanks in advance!
Compared to storing plain files, no there is no disadvantage to storing symlinks. Performance will be slightly slower because of the indirection, but dentries and inodes are cached too.
However, I strongly suggest you need hard links, because that way, the content will stay around until the last of the links is deleted.
I agree with sehe, and please also note that hard links will use only 18,000 inodes instead of 106; a hard link only uses an additional directory entry that points to the one and only inode. You will save 106 * inode size bytes on-disk and in your memory cache.
精彩评论