开发者

Alternative to a large database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_StackOverflow中文版

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 7 years ago.

Improve this question

I am having a database with tables having billions of rows in a single table for a month and I am having data for the past 5 years. I tried to optimize the data in all possible ways, but the latency is not decreasing. I know there are some solutions like using horizantal shrading and vertical shrading. But I am not sure about any open source implementations and the development time required to make the switch. Does anyone have any experience with using such systems?

Thank you.


nobody can suggest anything without a use case. When you have data that's "Sagan-esque" in magnitude, the use case is all important, since, as you've likely discovered, there simply isn't any "general" technique that works. The numbers are simply too large.

So, you need to be clear about what you want to do with this data. If the answer is "everything" then, you get slow performance, because you can't optimize "everything".

Edit:

Well, which is it? 2 or 3? How big are the result sets? Do you need access to all 5 years or just the last month? Do you really need all that detail, or can it be summarized? Do you need to sort it? Are the keys enough? How often is the data updated? How fast does the data need to be online once it is updated? What kind of service level does the data need to have? 24x7x7 ? 9-5x5? Day old data is OK? Who's using the data? interactive users? Batch reports? Exports to outside entities?


Read up on Data Warehousing...

  1. Capture data in flat files. Do not load a database.

  2. Design a proper Star Schema architecture.

  3. Write programs to do dimensional conformance; those programs will load dimension changes only to a database.

  4. Write programs to load selected flat-file records into a datamart with a copy of the dimensions.

Do not load a database with raw data. Ever.


Postgress supports partioning tables. If nothing else read their documentation. Answering Will Hartung's questions will help a lot in arriving at a solution.


How many GB of data is this? This reminds me of the story of LinkIn and how to calculate the social network fast enough, they had to run everything in memory. StackOver itself runs on a server with lots of memory and has most of the database in memory at any one time, according to the SO podcast.

Also reminds me google's problem, which required custom software and tons of cheap machines working in tandem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜