开发者

How to scale a document storage system?

I maintain a web application (ASP.NET/IIS7/SQL2K8/Win2K8) that needs to access documents, actually hundreds of thousands of documents, and growing. Currently, they are all on a Windows 2K8 Server fileshare, being accessed by UNC path (SMB). The files are in a single flat directory and I'm trying to plan how to best improve this solution. I don't want to use the SQL Filestream attribute as it would be significant effort to migrate it all into that, and would really lock in to SQL Server. I also need to find a way to replicate the data for disaster recovery, so perhaps a solution can help with that too.

Options could be:

  • Segment files into multiple directories?
    • Application would add metadata for which directory it's on (or segment by other means)
  • Segment files into separate servers? (virtualize)
    • Backup becomes 开发者_运维百科more complicated.
    • Application would add metadata for which server it's on
  • NAS Storage
  • SAN Storage
  • Put a service (WCF) in front of the files and have the app talk to the service
    • bonus of being reusable across many applications

Assuming I'm going to store on filesystem and not in database (I've read those disccusions here), which would be a more scalable solution?


You've got a couple issues: - managing a large volume of (static?) files - preparing for backups and disaster recovery of said files

I'll throw this out there, even though I'm not a fan of the answer, but you might poke around with the free SharePoint 2010 Foundation that's included with server 2k8. If you're having issues with finding the documents you need (either by search, taxonomy via tagging or other metadata) as well as document expiration and you don't want to buy a full blown document management system, this might be a solution. Of course it introduces new problems...

If your only desire is to have these files available to spit out on the web, then the file store like you're using now really is the simplest solution. For DR/redundancy purposes, I'd look at a) running them on a raid/SAN of some sort and b) auto-syncing them with the cloud (either azure or amazon). For b) you can get apps that make the cloud appear as a mapped drive and then use an rsync type software to keep the cloud up to date.

If you want to build something new and cool, you might think about moving the entire file archive into the cloud and just write a table in a db to manage the file name, old location, new cloud location and a redirector code that can provide the access tokens to requestors.

3 different approaches... your choice.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜