Importing Date-datatype using mongoimport
I have many GB of data stored in PostgreSQL database and i need those to be imported into the MongoDB. I did this using CSV export and mongoimport.
There are columns like this '2011-06-25' in that CSV and it has been imported as string, not as MongoDate, so i cannot effectively search by date.
I've found this : http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-Exam开发者_JAVA技巧ple%3AImportingInterestingTypes but the example says, i need to use JSON structure for the file. Do i really need to export JSON file from PostgreSQL?
If i do - how?
If i don't, how to export "MongoDate" through CSV?
Actually the first option is pretty fast even with huge data. Here is an example query using the mongo console:
/usr/bin/mongo yourdbname --eval "db.yourcollectionname.find().forEach(function(doc){doc.yourdatefield = new ISODate(doc.yourdatefield);db.yourcollectionname.save(doc)});"
Your options:
import the stuff as CSV and convert the data after the import to Date() (either using the mongo console or using a script written in your favorite language)
import your data as JSON and using the $date descriptor for converting date strings into Date instances
Write a script using your favorite language to import the data. It will allow you to control the data-type and apply structural changes to the data as well.
Generally you will find that the mapping of the data is not the same when converting from tabular data to a document database.
If you write a script you may also want to support running multiple copies (or threads) to improve the import speed.
When possible, re-importing from the source CSV should be faster. For a relatively large dataset (130M objects, source CSV ~6GB), a mongoimport
run took 40mins, but the forEach
suggested by @webDEVILopers took 24h+ (at which point I canceled it).
To import from CSV and convert your date, I ran
mongoimport --db your-db-name --type csv --file your-file.csv --collection your-collection --fields timestamp.date\(2006-01-02\ 15:04:0
5.00000+00\),count.int32\(\), --columnsHaveType
The .date()
part is tricky, you have to specify that particular date in your source format. So my timestamps were like "2018-02-03 12:09:23.00000+00", which meant formatting the reference date (Jan 2nd, 2006 15:04:05) in that format (as in command above). See this for reference.
精彩评论