Indexing uploaded documents - searchable only by the users that uploaded them
If someone could point me in the right direction that would be most helpful.
I have written a custom CMS where I want to be able to allow each individual user to upload documents (.doc .docx .pdf .rtf .txt etc) and then be able to search the contents of those files for keywords.
The CMS is written entirely in PHP and MySQL within a Linux environment.
Once uploaded the documents would be stored in the users private folder on the server "as is". There will be hundreds if not thousands of documents stored by each user.
It is very important that the specific users files are searchable only by that user.
Could anyone point me in the right direction开发者_JAVA技巧? I have had a look at Solr but these types of solutions seem so complicated. I have spent an entire week looking at different solutions and this is my last attempt at finding a solution.
Thank you in advance.
2 choices I see.
A search index per user. Their documents are indexed separately from everyone else's. When they do a search, they hit their own search index. There is no danger of seeing other's results, or getting scores based on contents from other's documents. The downside is having to store and update the index separately. I would look into using Lucene for something like this, as the indices will be small.
A single search index. The users all share a search index. The results from searches would have to be filtered down so that only results were returned for that user. The upside is implementing a single search index (Solr would be great for this). The down side is the risk of cross talk between users searches. Scoring would be impacted by other users documents, resulting in poorer search results.
I hate to say it, but from a quality standpoint, I'd lean towards number 1. Number 2 seems more efficient and easier, but user results are more important to me.
keep the files outside of the public directory tree, keep a reference to the file's filepath and creator's user id in a database table, then they can search for the files using database queries. you will of course have to let users create accounts and have a log in. you can they let them download the files using php.
As long as the user's files are all located in an isolated directory, or there is some way specify one user's documents, like adding the user id to the filename, you could use grep.
The disadvantages:
- Each search would have to go through all the documents, so if you have a lot of documents or very large documents it would be slow.
- Binary document formats, like Word or PDF, might not produce accurate results.
- This is not an enterprise solution.
Revised answer: Try mnoGoSearch
精彩评论