LRU file cache and the cost of finding a file in a Windows directory
I have an application that will download and cac开发者_如何转开发he, at a minimum, 250,000 8KB* files totaling about 2GB. I need to remove the least recently used file when updating this cache. *These tiny files span two 4KB sectors.
What is the relative cost of obtaining a file handle by name for this type of file in a directory on an NTFS-formatted 5400 RPM drive? If I store all 200K files in one directory will merely getting a file handle take more than a few milliseconds? I can easily bucket the files into different directories.
Windows 7 disables the last access time for files by default, and I don't want to require an administrator to enable this feature. Should I maintain a separate list of file access times in memory (serialized to disk when the app exits?)
Should I consider storing these files in one large flat file? Memory mapping might be difficult if I use anything older than .NET 4.0
Opening 250,000 files -- if that's what you mean -- will take more than a few milliseconds, yes. The size of the directory is less interesting than the fact that you're going through the entire file system stack 250,000 times (everything from NTFS, the kernel, and your grandmother's favorite anti-virus filter all have to get a chance to play).
And last access time isn't rock-solid in any case.
One seek is approximately 15ms on an average 5400rpm drive. The rest is minuscule in comparison.
精彩评论