开发者

Guidance on designing a solution - XML files vs database

I am thinking of storing bunch of data in XML files. Each file will has information about a distinct element lets say contacts. Now I am trying to do retrieve a contact based on some information eg: Find all the contacts who live in CA. How do I search for this information? Can I use something like LINQ. I am seeing XElement but does it work for multiple XML files.

Does converting to datasets help? So I am thinking I should have a constructor for my application which loads all the xml files into a dataset and perform queries on the dataset. If this is a good approach can someone point me to examples/resources?

And most importantly is this a good solution or should I use databases? The reason I am using XML files is I need to extend this solution to use xquery in the backend tiers (business logic, database) in future and I thought having data i开发者_开发技巧n xml files would be helpful.

Update I already have the schema here - http://ideone.com/ZRPco


If you put the data in a database then it's easy to output it as XML. Don't start off in XML just because you're going to need to end up there. If you're needing to do queries on the data then a database is almost certainly the best option.


You can use XML in your cause. just to understand your example.

you may have 1000 Employees in your company. Each Employeer can have zero or more contacts( like primary, secondray, etc ). so every employeer can have a contacts.xml ( identified based on Xml Databases like eXist, MarkLogic, Berkely etc ).

e.g) -contacts.xml

Once the Data is inside an Xml Database. Then Database can fetch you all sort details based on what ever facet you want.

like fetch contacts by ZipCode, by City, by Name etc.

All you need to is write specific XQuery to mine the Data for your request. ( in case of MarkLogic Xml Database Server ). The Terminology used in this world is Faceted browsing.

Xml Databases are designed to handle such information. View Contacts as a Mass Data rather than Rows/Columns.


Here are two reasons not to use XML ...

  1. if the dataset is large, i would not use xml. you either have a use a dom parser (slow on big data) or a sax parser (faster, but you lose validation ability until the whole file is read).

  2. if the data is going to change. You have to rewrite the whole xml file in order to change a portion of it.

Here is the reason I would use XML .. If the dataset is small, is naturally hierarchical, and needs to be viewable/editable in a text editor.

If you need to output as xml, it is not a problem to output xml from a database.


Lots of comments here, nobody has much understanding of MarkLogic Server XML Databases, and how powerful XML can be as a storage format when multiple types of indexes are applied (element, value, attribute, xml structure, xml node order, word, phrase indexes)

MarkLogic can store/index billions of XML documents and allow sub-second searching across all of them, complex SUM COUNT MIN MAX operations, etc.

I've used relational XML files with C#.NET LINQ-to-XML to achieve what the original poster wants to achieve. (No MarkLogic at this point, just plain XML files and C# LINQ code that joins them together to achieve whatever type of search I'm looking for) You may have an XML file for contacts:

<contacts>
  <contact id="1" companyid="1">
    <name></name>
    <address></address>
    <city></city>
    <state></state>
  </contact>
</contacts>

You may also want to join this to another XML file for companies:

<companies>
  <company id="1">
    <name></name>
    <address></address>
    <city></city>
    <state></state>
  <company>
</companies>

Here is some sample C#.NET LINQ-to-XML syntax to achieve doing a LEFT OUTER JOIN between these two files:

using System.Xml.Linq.XDocument 

XDocument xDocContacts = XDocument.Load("contacts.xml");
XDocument xDocCompanies = XDocument.Load("companies.xml");

var results = from ct in xDocContacts.Root.Element("contacts").Elements("contact")
              join cp in xDocCompanies.Root.Element("companies").Elements("company")
              on ct.Attribute("companyid").Value.ToString() equals cp.Attribute("id").Value.ToString()
              into joined
              select joined.DefaultIfEmpty();
foreach (var item in joinedResults)
{
}

I've used this with XML files of 90MB joining with smaller XML files of 4-5MB, and can perform complex searches with multiple WHERE conditions in the 2-3 sec range.


It definitely sounds like databases would be the correct solution. The two requirements I see here are you will need to run certain types of queries against the dataset and you need it to be in XML at a certain point. A SQL database will be able to handle complex queries a lot better than XML files while at the same time you can always convert the data to XML when you need it.


As per my experience, using XML as a master data source is not a good idea, it will be a pain at some point. Try SQLite instead, it is a powerful and portable relational database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜