Alternative to a large database [closed]

2023-01-02 03:02 问答作者：

Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_StackOverflow中文版

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 7 years ago.

Improve this question

I am having a database with tables having billions of rows in a single table for a month and I am having data for the past 5 years. I tried to optimize the data in all possible ways, but the latency is not decreasing. I know there are some solutions like using horizantal shrading and vertical shrading. But I am not sure about any open source implementations and the development time required to make the switch. Does anyone have any experience with using such systems?

Thank you.

nobody can suggest anything without a use case. When you have data that's "Sagan-esque" in magnitude, the use case is all important, since, as you've likely discovered, there simply isn't any "general" technique that works. The numbers are simply too large.

So, you need to be clear about what you want to do with this data. If the answer is "everything" then, you get slow performance, because you can't optimize "everything".

Edit:

Well, which is it? 2 or 3? How big are the result sets? Do you need access to all 5 years or just the last month? Do you really need all that detail, or can it be summarized? Do you need to sort it? Are the keys enough? How often is the data updated? How fast does the data need to be online once it is updated? What kind of service level does the data need to have? 24x7x7 ? 9-5x5? Day old data is OK? Who's using the data? interactive users? Batch reports? Exports to outside entities?

Read up on Data Warehousing...

Capture data in flat files. Do not load a database.
Design a proper Star Schema architecture.
Write programs to do dimensional conformance; those programs will load dimension changes only to a database.
Write programs to load selected flat-file records into a datamart with a copy of the dimensions.

Do not load a database with raw data. Ever.

Postgress supports partioning tables. If nothing else read their documentation. Answering Will Hartung's questions will help a lot in arriving at a solution.

How many GB of data is this? This reminds me of the story of LinkIn and how to calculate the social network fast enough, they had to run everything in memory. StackOver itself runs on a server with lots of memory and has most of the database in memory at any one time, according to the SO podcast.

Also reminds me google's problem, which required custom software and tons of cheap machines working in tandem.

继续阅读：algorithm data-structures database large-files mapreduce

Alternative to a large database [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？