Secure file server
Introduction
I want to create a Java web application for storing and backing up user files, similar to Dropbox. One of the interesting Dropbox feature is that it can detect whether a certain file already exists on server. For example, if one user upload a file onto server, another user who tries to upload the same file will not need to upload the same file content. Server will only need mark that he has the same file. This helps to save the bandwidth/space and increases the speed in many ways.
The most basic solution to this problem is to use a file hash string, e.g. sha1, md5, etc., to identify the file. The client software check whether a certain hash exists on server or not. If it exists, then it can skip the uploading process and mark that user has the same file.
Problem
The web application is implemented based on REST architecture so that user can easily write their own client software to upload their files. For security reasons, the SSL is enabled for all transactions. But my most security concern is about users faking that they have a file without actually owning it if I use sha1 or any other standard hash alogorithms. This cannot be prevented by SSL or encryption. If a user manage to get the hash string, e.g. md5 and sha1 of many files can be found by googling, he can mark that he has the file using REST service on the web application.
So one of the possible solution is that the server requests a set of certain random bytes from the file as well as the hash of the whole file. Here is example steps:
- Client checks whether a certain hash exists on server or not. Then, server returns the required positions of random bytes if the file already exists.
- Client sends random bytes as per request if the server has the file. Client software will not be able to response to it without having the actual file.
In this way, it can save the bandwidth as well as ensure that user owns the file they want to upload.
Question
I am no expert in Security over the web so I have no idea whether this is a good idea or not. I have read some articles about implementing their own fancy process might lead to the reduction in security strength because the security cannot be tested and the extra information may provide a cracking method.
Does anyone has any comment on the process?
Will it reduce the sucurity?
Does anyone have an idea to solve this problem differently?
I understand that there might not be an exactly answer to this question but I would like to hear if开发者_如何学C anyone has encounter the same problem and has any good solution to it.
Rather than asking the client to upload some random bytes of the file's contents, it may be better to ask the client to upload the hash of a random region the file. That way you can use a wider range of sizes that you ask the client to verify.
Better yet, though, may be to send the client a random number and require the client to compute an HMAC of the entire file's contents using that number as the key. This is more computationally-expensive since the server must compute the HMAC too, but it verifies that the client has the entire file, not just a small portion of it.
One unavoidable side effect of this hash feature, even with a verification scheme, is that it reveals that a copy of the file already exists somewhere on the server. That by itself may be sensitive information.
For the most stringent privacy protection, you should forego this feature and make each user upload their own copy of the file. You can use hash comparison on the server to avoid storing multiple copies of the file, transparently to the clients.
精彩评论