开发者

MongoDB - anonymizing 600k records

I am trying to anonymize a large data set of about 600k records (removing sensitive information like email, etc.) so that it can be used for some perfo开发者_如何学Crmance tests.

I am using Scala (Casbah) with Mongo. The actual script is pretty simple and straightforward. When I run the script, the entire process starts off pretty fast - parsing 1000 records every 2-3 seconds, but it slows down tremendously and starts crawling very slowly.

I know this is pretty vague without too much details, but any idea why this is happening, and any hints on how I could speed this up?


It turned out to be an issue with the driver and not with Mongo. When I tried the same inserts using the mongo shell, it was through without breaking a sweat.

UPDATE

So, I tried both approaches. Inserting into existing collection and dumping the results in a new collection. The first approach was faster for me. Of course, one should never assume this to be always true and must benchmark before choosing the first approach over the second. In both case, Mongo was very very fast (meaning - it was not taking hours to get this done). There was a problem with the Java interface I was using to connect with Mongo, which was more of a stupid mistake on my part.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜