.NET document management system design - performance questions
I need to develop a basic .NET document management system with the following specifications:
The data should be portable and self contained, so I will serialize the documents (typical formats include Word, PDF, Excel and Powerpoint) into binary data. I will then store said binary data in a SQL Server 2005 database. When a user needs to download a document, the system will deserialize the binary data and will present it in is original format.
The average row size cannot be bigger than 200k.
We expect a maximum of 500 documents will be uploaded monthly for a period of three years.
We don't expect the size of the database to ever go over 6 GB
We have maximum target of 20,000 people that potentially would access the system at the same time.
My question is: How robust does the technology need to be in order to offer开发者_如何学Go solid performance, prevent site downtime, etc?
I am a novice developer and am not familiar with this kind of architecture and design.
This is more than just a "basic" system. So here would be my concerns right off the bat:
- 500 documents a month for 3 years seems like it could well exceed 6 GB for database size. You may want to determine the maximum document size and see if that calculation holds up.
- 20,000 users is a lot. How many can you expect at once? If it is more than 100 concurrent users I would start to investigate server clustering/web farms to be able to handle the load
- just a nit pick, but you won't be "serializing" in the .NET "Serializable" sense. You'll just be storing the raw document bytes in the DB
- if you need high availability you will need to look at DB replication to another DB instance just in case your DB Server goes down
Lastly. I have to believe that there are off-the-shelf systems that do what you want, and also include more advanced functionality like permission-based access and document revision.
Mike
What's the reason for needing to store the files in the database, instead of just storing the path of the document on a file server or CDN? Would be a lot less load on your DB server, and give you more flexible options for document storage.
If you're having issues with moved/deleted files in a system like the one I suggested, then perhaps also consider other options, such as:
- Locking permissions on the underlying file system to everyone except the role the application is running under (easiest option)
- Running a background service which listens for changes to folders etc and updates the database accordingly
In the end, a database-only solution may be simpler, but I wouldn't underestimate the load you may hit upon by storing large files for tens of thousands of users.
An important part of programming is knowing when you're in over your head. If the CTQ's you've posted are real, specifically the concurrent access requirement, then you're in for a world of hurt. Even those of us with quite a lot of time in the trenches are going to be in a world of hurt with that sort of requirement. I'd tackle the problem with the following mindset:
I'm going to get this wrong in more ways that I can currently imagine.
Knowing this much, the simpler you keep this architecture, the more likely it is to scale. However, the company I work for is absolutely massive and I doubt even we have any systems that truely have 20,000 concurrent users. So don't bite off more than you can chew.
Design your architecture to be simple and robust (a tall order) and you'll find it will scale naturally until you eventually need to call in the big guns.
I can suggest that you should at least spend money on access to SQL Server 2008. With that version your problem should be fairly elementary for starters. Use the FILESTREAM
storage for the files. No serialization necessary. This will store the files on an NTFS file system and will maximize your ease of programming, maintenance, and scalability.
If you for some reason only have SQL Server 2005, you'll have to deal with BLOB
s which isn't exactly difficult, but is somewhat messy. I suggest you read To BLOB or Not to BLOB from Microsoft Research to make the decision if storing the data in SQL Server 2005 is the best bet for you. If so, there are plenty of articles detailing how to put files into SQL Server BLOB
s. Just be aware this is rarely the most efficient or scalable solution.
精彩评论