MOSS 2007 as a large repository of PDF documents
Actually I try to examine the possibility of building a repository of PDF documents based on MOSS2007. No workflow, only huge amount of documents and access to document libraries (also searchable).
The question is feasibility of building such a solution, assuming that: - PDF documents can be up to one million (!) once thrown into document libraries and provided by the web on the outside;
The farm is what is proposed: - 1x Front Web Se开发者_StackOverflowrver - 2x Index Server - 1x Query Server - 1x MS SQL Server - 2x 12TB Storage
Is it possible to provide reasonable performance with such a huge number of files? Has anyone had to deal with with the building of a similar type solutions of Digital Library?
You will run into performance issues if you put more than 2000 items in a single list. One strategy to get around this problem is to use folders as buckets with a limit of 2000 items in each one.
It would also be wise to consider separating into several Site Collections so that all of these documents are not in a single SQL database.
Updating and consolidating:
As Benjamin J Athawes points out, content sizing is also an important factor to consider. See his answer for details.
nRouteNPingMe offers up considering 2010 as a solution since this has been addressed in the newer version. If you're not tied to 2007, I would consider taking this route.
Chris's answer is not exactly correct. You can have a lot more than 2000 items in a list, as long as they are not all displayed in a single view.
In a document library (where you would store your PDF documents) you can have up to 5 million items. As long as you find a folder structure / views that work with the < 2000 items / view constraint.
So the question is, can you separate your documents in a way that makes sense to you? If so, I wouldn't worry about scalability.
The numbers I mention here all come from this technet article.
The TL;DR version : http://www.sharepointkings.com/2009/01/limitation-and-upper-boundaries-of_28.html
Something that I haven't seen mentioned so far is file size.
Assuming that each PDF is on average 1MB in size you will run into content database sizing limitations way before the aforementioned limitations around # items / scope.
Capacity planning is all about compromise - if you want to store 1 million documents you will need to think about splitting the files across multiple content databases - and therefore multiple site collections.
Whilst in some fringe cases Microsoft support up to 1TB of content per database in SharePoint 2010 (for static repositories), I am not aware of a similar support scenario for SharePoint 2007.
As regards FileStream (I assume you are referring to RBS here), I would not recommend it in a production scenario without very careful consideration. I would view it primarily as a cost saver and bear in mind that it can add significant complexity to your backup and DR strategy.
Hope that helps.
There are a couple things going on here and no one can answers all your questions with the facts that you have given us.
First up, the amount of documents you propose can be handled by a single document library (or several document libraries) so long as you follow the advice above about storing items in folders. That is critical.
What we can't tell you is if you have enough hardware. Sure it is pretty easy to know if you have enough storage but getting the right amount of SP hardware is dependant on your use cases and other factors:
- How many users?
- How concurrent?
- How often does the data change?
- Do the items have unique security requirements?
- What kinds of searches will you be preforming against the data?
- and so on...
Lastly, you mention that you want 2 index servers for MOSS2007. While there are scenarios in MOSS2007 that rely on multiple index boxes they aren't redundant as you would think. More likely you'd have a single index box and multiple query boxes (or web servers that are also query servers).
精彩评论