Unix stat()/lstat() for Java
Suppose I want to get several of a file's properties (owner, size, permissions, times) as returned by the lstat() system call. One way to do this in Java is to create a java.io.File object and do calls like length(), lastModified(), etc. on it. I have two problems so far:
Each one of these calls triggers a stat() call, and for my purposes stat()s are considered expensive: I'm trying to scan billions of files in parallel on hundreds of hosts, and (to a first approximation) the only way to access these files is via NFS, often against filer clusters where stat() under load may take half a second.
The call isn't lstat(), it's typically stat() (which follows symlinks) or fstat64() (which opens the file and may trigger a write operation开发者_JAVA技巧 to record the access time).
Is there a "right" way to do this, such that I end up just doing a single lstat() call and accessing the members of the struct stat? What I have found so far from Googling:
JDK 7 will have the PosixFileAttributes interface in java.nio.file with everything I want (but I'd rather not be running nightly builds of my JDK if I can avoid it).
I can roll my own interface with JNI or JNA (but I'd rather not if there's an existing one).
A previous similar question got a couple of suggested JNI/JNA implementations. One is gone and the other is questionably maintained (e.g., no downloads, just an hg repository).
Are there any better options out there?
Looks like you've pretty much covered all the bases. When I started reading your question my first thought was JDK 7 or JNI. Without knowing anything about the change pattern on these files you might also look into some sort of persistent cache of the information in question, like an embedded DB. You could also look at some other access method besides NFS, like a custom web service that provides bulk file information from a remote host.
Yes, stat() is under all the calls and libraries. It is a latency problem. However, you can do many stat() at once, as there are many NFS server daemons to support your connections, using threads unless someone has an asynchronous stat() up their sleeve! If you could get on the host, like with ssh, stat() would be much cheaper. You could even write a tcp service to stream in paths and stream out stat(). Unfortunately, access to the NFS server is hard or impossible, as it may only have admin accounts, be a Hitachi SAN or something.
精彩评论