开发者

Importing Date-datatype using mongoimport

I have many GB of data stored in PostgreSQL database and i need those to be imported into the MongoDB. I did this using CSV export and mongoimport.

There are columns like this '2011-06-25' in that CSV and it has been imported as string, not as MongoDate, so i cannot effectively search by date.

I've found this : http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-Exam开发者_JAVA技巧ple%3AImportingInterestingTypes but the example says, i need to use JSON structure for the file. Do i really need to export JSON file from PostgreSQL?

If i do - how?

If i don't, how to export "MongoDate" through CSV?


Actually the first option is pretty fast even with huge data. Here is an example query using the mongo console:

/usr/bin/mongo yourdbname --eval "db.yourcollectionname.find().forEach(function(doc){doc.yourdatefield = new ISODate(doc.yourdatefield);db.yourcollectionname.save(doc)});"


Your options:

  • import the stuff as CSV and convert the data after the import to Date() (either using the mongo console or using a script written in your favorite language)

  • import your data as JSON and using the $date descriptor for converting date strings into Date instances


Write a script using your favorite language to import the data. It will allow you to control the data-type and apply structural changes to the data as well.

Generally you will find that the mapping of the data is not the same when converting from tabular data to a document database.

If you write a script you may also want to support running multiple copies (or threads) to improve the import speed.


When possible, re-importing from the source CSV should be faster. For a relatively large dataset (130M objects, source CSV ~6GB), a mongoimport run took 40mins, but the forEach suggested by @webDEVILopers took 24h+ (at which point I canceled it).

To import from CSV and convert your date, I ran

mongoimport --db your-db-name --type csv --file your-file.csv --collection your-collection --fields timestamp.date\(2006-01-02\ 15:04:0
5.00000+00\),count.int32\(\), --columnsHaveType

The .date() part is tricky, you have to specify that particular date in your source format. So my timestamps were like "2018-02-03 12:09:23.00000+00", which meant formatting the reference date (Jan 2nd, 2006 15:04:05) in that format (as in command above). See this for reference.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜