开发者

Best way to detect duplicate uploaded files in a Java Environment?

As part of a Java based web app, I'm going to be accepting uploaded .xls &开发者_如何学Goamp; .csv (and possibly other types of) files. Each file will be uniquely renamed with a combination of parameters and a timestamp.

I'd like to be able to identify any duplicate files. By duplicate I mean, the exact same file regardless of the name. Ideally, I'd like to be able to detect the duplicates as quickly as possible after the upload, so that the server could include this info in the response. (If the processing time by file size doesn't cause too much of a lag.)

I've read about running MD5 on the files and storing the result as unique keys, etc... but I've got a suspicion that there's a much better way. (Is there a better way?)

Any advice on how best to approach this is appreciated.

Thanks.

UPDATE: I have nothing at all against using MD5. I've used it a few times in the past with Perl (Digest::MD5). I thought that in the Java world, another (better) solution might have emerged. But, it looks like I was mistaken.

Thank you all for the answers and comments. I'm feeling pretty good about using MD5 now.


While processing uploaded files, decorate the OutputStream with a DigestOutputStream so that you can calculate the digest of the file while writing. Store the final digest somewhere along with the unique identifier of the file (in hex as part of filename maybe?).


You only need to add a method like this to your code and you're done. There's probably no better way. All the work is already done by the Digest API.

public static String calc(InputStream is ) {
        String output;
        int read;
        byte[] buffer = new byte[8192];

        try {
            MessageDigest digest = MessageDigest.getInstance("SHA-256"); //"MD5");
            while ((read = is.read(buffer)) > 0) {
                digest.update(buffer, 0, read);
            }
            byte[] hash = digest.digest();
            BigInteger bigInt = new BigInteger(1, hash);
            output = bigInt.toString(16);

        } 
        catch (Exception e) {
            e.printStackTrace( System.err );
            return null;
        }
        return output;
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜