Storing large XML in MongoDB

2023-04-12 11:44 问答作者：

I have a pretty huge xml (>10mb in size & 40+ elements). Currently we store such xml in Oracle db and use xquery to query and retrieve parts of the xml. This process is slow and takes many db calls. We are exploring mongodb to store this xml and query it. 开发者_运维技巧 I justed converted the xml to json and loaded into a mongo collection and it stored the huge json data in a flash. And it stores the xml nodes as nested docs. But when I query (using find) for a inner most element, it always returns the whole doc, containing nodes with non-matching element values also. I expect only few nodes that matches the given node value. Let me know if there is any best way to store such large xml files in mongo db. And also let me know how to retrieve the inner nodes having exact values specified in the query. Thanks in advance.

Have you thought about trying an up-to-date XML Database, such as BaseX (http://basex.org)? It might give you much better results, in particular if you have used XQuery before anyway.

I had the same problem. In my case the top-level node in each XML file always contained a huge list of smaller nodes, so I ended up storing those items instead. To do it, I wrote my own xml-to-json command line tool. I've used it to convert 10GB of XML data into JSON, in a format that mongoimport can eat.

There are several facts you should keep in mind:

Number 1- MongoDB returns only the whole document depending on whether it hit or not, there is no feature to return only a part of it (10 October, 2011) and if you need filtering you have to implement it with you own code.

Number 2- pay attention to elemmatch keyword. It indicates to search for some hits only in the same subdocument but not htourghout the whole document, so you might be confused here.

Number 3 - there is not right strategy of dividing your aggregate into collection in mongo comparing to RDBMS-s. So different data representation might solve your case.

Number 4 - despite of number 3 remark about the "no right way", there is a general recommendation to keep your documents less than 10 MB size

You should look at Sausalito XML database: http://www.28msec.com. It's using MongoDB as datastore.

This is the behavior of filtering multi level embedded document, normally the matching filter would return the whole document, not the subsets.

Check out my answers for mongodb-querying-array-elements-within-a-document and how-to-find-the-matched-record-in-mongodb for more info

May be you can add the sample xml schema currently you have, someone will help you structure the app.

继续阅读：mongodb xml

Storing large XML in MongoDB

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？