Fast way to replicate a huge database table

2023-01-07 13:17 问答作者：

We are currently trying to solve a performance problem. Which is searching for data and presenting it in a paginated way takes about 2-3 minutes.

Upon further investigation (and after several sql tuning), it seems that searching is slow just because of开发者_运维技巧 the sheer amount of data.

A possible solution that I'm currently investigating is to replicate the data in a searchable cache. Now this cache can be in the database (i.e. materialized view) or it could be outside the db (nosql approach). However, since I would like the cache to be horizontally scalable, I am leaning towards caching it outside the database.

I've created a proof of concept, and indeed, searching in my cache is faster than in the db. However, the initial full replication takes a long time to complete. Although the full replication will just happen once, and then succeeding replication will just be incremental against those that changed since the last replication, it would still be great if I can speed up the initial full replication.

However, during full replication, aside from the slowness of the query's execution, I also have to battle against network latency. In fact, I can deal with the slow query execution time. But the network latency is really really slowing the replication down.

So which leads me to my question, how can I speed up my replication? Should I spawn several threads each one doing a query? Should I use a scrollable?

Replicating the data in a cache seems like replicating the functionality of the database.

From reading other comments, I see that you are not doing this to avoid network roundtrips, but because of costly joins. In many DBMS you can create temporary tables - like this:

CREATE TEMPORARY TABLE abTable AS SELECT * FROM a , b ;

If a and b are large (relatively permanent) tables, then you will have a one-time cost of 2-3 minutes to create the temporary table. However, if you use abTable for many queries, then the subsequent per query cost will be much smaller than

SELECT name, city, ... , FROM a , b ;

Other database systems have a view concept which lets you do something like this

CREATE VIEW abView AS SELECT * FROM a , b ;

Changes in the underlying a and b table will be reflected in the abView.

If you really are concerned about network round trips, then you may be able to replicate parts of the database on the local computer.

A good database management system should be able to handle your data needs. So why reinvent the wheel?

SELECT * FROM YOUR_TABLE
Map results into an object or data structure
Assign a unique key for each object or data structure
Load the key and object or data structure into a WeakHashMap to act as your cache.

I don't see why you need sorting, because your cache should access values by unique key in O(1) time. What is sorting buying you?

Be sure to think about thread safety.

I'm assuming that this is a read-only cache, and you're doing this to avoid the constant network latency. I'm also assuming that you'll do this once on start up.

How much data per record? 12M records at 1KB per record means you'll need 12GB of RAM just to hold your cache.

继续阅读：database performance replication

Fast way to replicate a huge database table

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？