For Mongodb is it better to reference an object or use a natural String key?
I am building a corpus of indexed sentences in different languages. I have a collection of Languages which have both an ObjectId and the ISO code as a key. Is it better to use a reference to the Language collection or store a key like "en" or "fr"?
I suppose it's a compromise between:
开发者_如何学编程- ease of referencing the Language
- object in that collection
- speed in doing queries where the sentence has a certain language
- the size of the data on disk
Any best practices that I should know of?
In the end, it really comes down to personal choice and what will work best for your application.
The only requirement that MongoDB imposes upon _id is that it be unique. It can be an ObjectId (which is provided by default), a string, even an embedded document (As I recall it cannot be an Array though).
In this case, you can likely guarantee ISO Code is a unique value and it may be an ideal value. You have a 'known' primary key which is also useful in itself by being identifiable, so using that instead of a generated ID is probably a more sensible bet. It also means anywhere you 'reference' this information in another collection you can save the ISO Code instead of an Object ID; those browsing your raw data can immediately identify what information that reference points at.
As an aside:
The two big benefit of ObjectId is that they can be generated uniquely across multiple machines, processes and threads without needing any kind of central sequence tracking by the MongoDB server. They also are stored as a special type in MongoDB that only uses 12 bytes (as opposed to the 24 byte representation of the string version of an ObjectID)
Unless disk space is an issue, I'd probably go with the language key like "en" or "fr". This way it saves doing an additional query on the Languages collection to find the ObjectId key for a given language, you can just query the sentences
directly:
db.sentences.find( { lang: "en" } )
So long as the lang
field is indexed - db.sentences.ensureIndex( { lang: 1 } )
- I don't think there'll be much difference in query performance.
If you've got a humongous data set, and disk space is a concern, then you could consider an ObjectId (12 bytes), or a number (8 bytes), which might be smaller than a UTF-8 string key depending on its length.
精彩评论