开发者

Storing large XML in MongoDB

I have a pretty huge xml (>10mb in size & 40+ elements). Currently we store such xml in Oracle db and use xquery to query and retrieve parts of the xml. This process is slow and takes many db calls. We are exploring mongodb to store this xml and query it. 开发者_运维技巧 I justed converted the xml to json and loaded into a mongo collection and it stored the huge json data in a flash. And it stores the xml nodes as nested docs. But when I query (using find) for a inner most element, it always returns the whole doc, containing nodes with non-matching element values also. I expect only few nodes that matches the given node value. Let me know if there is any best way to store such large xml files in mongo db. And also let me know how to retrieve the inner nodes having exact values specified in the query. Thanks in advance.


Have you thought about trying an up-to-date XML Database, such as BaseX (http://basex.org)? It might give you much better results, in particular if you have used XQuery before anyway.


I had the same problem. In my case the top-level node in each XML file always contained a huge list of smaller nodes, so I ended up storing those items instead. To do it, I wrote my own xml-to-json command line tool. I've used it to convert 10GB of XML data into JSON, in a format that mongoimport can eat.


There are several facts you should keep in mind:

Number 1- MongoDB returns only the whole document depending on whether it hit or not, there is no feature to return only a part of it (10 October, 2011) and if you need filtering you have to implement it with you own code.

Number 2- pay attention to elemmatch keyword. It indicates to search for some hits only in the same subdocument but not htourghout the whole document, so you might be confused here.

Number 3 - there is not right strategy of dividing your aggregate into collection in mongo comparing to RDBMS-s. So different data representation might solve your case.

Number 4 - despite of number 3 remark about the "no right way", there is a general recommendation to keep your documents less than 10 MB size


You should look at Sausalito XML database: http://www.28msec.com. It's using MongoDB as datastore.


This is the behavior of filtering multi level embedded document, normally the matching filter would return the whole document, not the subsets.

Check out my answers for mongodb-querying-array-elements-within-a-document and how-to-find-the-matched-record-in-mongodb for more info

May be you can add the sample xml schema currently you have, someone will help you structure the app.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜