Encrypted, Compressible, Cross Platform, File system in a file
We wish to make a desktop application that searches a locally packaged text database that will be a few GB in size. We are thinking of using lucene.
So basically the user will search for a few words and the local lucene database will give back a result. However, we want to prevent the user from taking a full text dump of the lucene index as the text database is valuable and proprietary. A web application is not the solution here as the Customer would like for this desktop application to work in areas where the internet is not available.
How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?
One w开发者_运维技巧ay of doing this, we thought, was if the lucene index could be stored on an encrypted file system within a file (something like truecrypt). So the desktop application would "mount" the file containing the lucene indexes.
And this needs to be cross platform (Linux, Windows)...We would be using Qt or Java to write the desktop application.
Is there an easier/better way to do this?
[This is for a client. Yes, yes, conceptually this is bad thing :-) but this is how they want it. Basically the point is that only the Desktop application should be able to access the lucene index and no one else. Someone pointed that this is essentially DRM. Yeah, it resembles DRM]
How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?
You don't. The user would have the key and the encrypted data, so they could access everything. You can bury the key in an obfuscated file, but that only adds a slight delay. It certainly will not keep out prying users. You need to rethink.
The problem here is that you're trying to both provide the user with data and deny it from em, at the same time. This is basically the DRM problem under a different name - the attacker (user) is in full control of the application's environment (hardware and OS). No security is possible in such situation, only obfuscation and illusion of security.
While you can make it harder for the user to get to the unencrypted data, you can never prevent it - because that would mean breaking your app. Probably the closest thing is to provide a sealed hardware box, but IMHO that would make it unusable.
Note that making a half-assed illusion of security might be sufficient from a legal standpoint (e.g. DMCA's anti-circumvention clauses) - but that's outside SO's scope.
Technically, there is little you can do. Lucene is written in Java and Java code can always be decompiled or run in a debugger to get the key which you need to store somewhere (probably in the license key which you sell the user).
Your only option is the law (or the contract with the user). The text data is copyrighted, so you can sue the user if they use it in any way that is outside the scope of the license agreement.
Or you can write your own text indexing system.
Or buy a commercial one which meets your needs.
[EDIT] If you want to use an encrypted index, just implement your own FSDirectory
. Check the source for SimpleFSDirectory
for an example.
Why not building an index that contains only the data that user can access and ship that index with the desktop app?
True-crypt sounds like a solid plan to me. You can mount volumes and encrypt them in all sorts of crazy overkill ways, and access them just as any other file.
No, it isn't entirely secure, but it should work well enough.
One-way hash function.
You don't store the plaintext, you store hashes. When you want to search for a term, you push the term through the function and then search for the hash. If there's a match in the database, return thumbs up.
Are you willing to entertain false positives in order to save space? Bloom filter.
精彩评论