开发者

To use a Blob or not to (mysql + coldfusion)

I would like to know if storing pdfs in a database table is a good long term idea. Here is a description of the problem:

I have a customer that has hundreds of clients that upload numerous pdf files as proofs. These pdf files range from fairly small ( < 100K ) to 10MB. These files can potentially get uploaded multiple times as they are proofs for a single project (i.e. proof1.pdf, proof2.pdf etc..) PDFs for each customer must remain separate, and PDFs for each project must remain separate for each customer.

Currently it is set up where the files are uploaded directly to a folder created for each client for each project. This is OK but 开发者_运维技巧does take up space and finding files can be a bit of a nightmare. Like I said multiple proofs will be uploaded for each project and each customer.

The best solution I can think of is to provide an interface that will upload PDF files directly into a db table which keeps track of the customer id, project id, and proof. This provides much better security, and provides the ability to get all PDF files from each customer for project X.

A database cleanup tool will be developed to delete records that are older than a specified period of time, so the table will not continue to grow forever, but I am worried about the performance hit (if there is one) and other negatives that I might be overlooking.

So, overall is this a good idea or should I figure out a better way to handle this in the filesystem?


I would recommend storing lightweight keys that point to data in a filesystem, in lieu of storing the actual files' data in a BLOB field. One possible arrangement would be to hash your files (with, say, SHA-1) and use that hash as the filename on disk - possibly even arranging the storage into a directory tree that maps over the first n hash characters (i.e., 80cdef... might be stored in storage/8/0/c/d/80cdef...).

Your table then might consist of a primary key, a human-friendly display name for the file, and a field containing the (hash) name of the physical file on disk.

This also lends you the flexibility to physically separate the file storage from the database storage into, say, a distributed filesystem; this would be a rather reasonable separation to make in a long-term system that will inevitably grow very large in size. In this way, you retain the benefits of a relatively small database (potentially better performance and less backup pain) while offloading the more difficult problem of massive storage to a system that exists outside of the database itself, and for which there are already a plethora of proven approaches.


I tend to shy away from storing files in databases. I've worked with Blackboard installations on a campus and you can upload files in that application. As a result, the database grew to a size that was unmanageable, over 1TB. Blackboard's backup system packaged up each course as a zip file and to do a complete backup of a course, all of the files had to be pulled and compressed... this became a lengthy process. We had to split (and re-split) backups regularly.

Here is another post that comments on this: Stackoverflow post

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜