How should I configure my Mongodb cluster?

2023-03-18 02:51 问答作者：

I am running a sharded mongodb environment - 3 mongod shards, 1 mongod config, 1 mongos (no replication).

I want to use mongoimport to import csv data into the database. I have 105million records stored in increments of 500,000 across 210 csv files. I understand that mongoimport is single threaded and I read that I should run multiple mongoimport processes to get better perform开发者_如何学Goance. However, I tried that and didn't get a speed up:

when running 3 mongoimports in parallel, I was getting ~6k inserts/sec per process (so 18k i/s) vs. running 1 mongoimport, I was getting ~20k inserts/sec.

Since these processes were routed through the single mongod config and mongos, I am wondering if this is due to my cluster configuration. My question is, if I set up my cluster configuration differently, will I achieve better mongoimport speeds? Do I want more mongos processes? How many mongoimports processes should I fire off at a time?

So, the first thing you need to do is "pre-split" your chunks.

Let's assume that you have already sharded the collection to which you're importing. When you start "from scratch", all of the data will start going to a single node. As that node fills up, MongoDB will start "splitting" that node into chunks. Once it gets to around 8 chunks (that's about 8x64MB of index space), it will start migrating chunks.

So basically, you're effectively writing to a single node and then that node is being slowed down because it has to read and write its data to the other nodes.

This is why you're not seeing any speedup with 3 mongoimport. All of the data is still going to a single node and you're maxing out that node's throughput.

The trick here is to "pre-split" the data. In your case, you would probably set it up so that you get about 70 files worth of data on each machine. Then you can import those files on different threads and get better throughput.

Jeremy Zawodny of Craigslist has a reasonable write-up on this here. The MongoDB site has some docs here.

I've found a few things help with bulk loads.

Defer building indexes (except for the one you have to on the shard key) until after you've loaded everything.

Run one mongos and mongoimport per shard and load in parallel.

And the biggest improvement: Presplit your chunks. This is a bit tricky since you need to figure out how many chunks you will need and roughly how the data is distributed. After you split them, you have to wait for the distributor to move them all around.

继续阅读：mongodb sharding

How should I configure my Mongodb cluster?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？