How to convert XML Data into a binary deliverable?

2022-12-13 23:16 问答作者：

We hav开发者_如何学Goe an application that requires loading A LOT of configuration data at startup. The data is stored in a XML File which currently is 40MB but will grow to 100MB and more. This data will change while developing but not between releases.

We are looking for a way to speed up the loading process for a "fixed" set of data and one idea is leading to this question:

What would be the easiest/most efficient way to convert the xml file into something which can be delivered as a binary?

For example we could generate a static class with a lot of 'new objectFromXML (param1, param2, ..., paramn)' lines in it's initialization method or we could use one object with a gigantic array containing the data. All this can be done without too much trouble but I suspect that there are more elegant solutions to our problem. Any comments would be highly appreciated.

protobuf-net can be compatible with both binary (Google's efficient "protocol buffers" format) and xml at the same time on the same class definitions*.

It can even work without any changes if your xml is element based and includes attributes like [XmlElement(Order = 1)] (to work, it needs to be able to find a unique number per property, you see). Note that if you use inheritance ([XmlInclude]) you'll need to add additional attributes (again, to nominate a number - via the similar [ProtoInclude])

Otherwise, you can add additional attributes, and job done; just call Serializer.Serialize.

Result: smaller, faster serialization.

*=and as proof, this is actually how the codegen works: compile the ".proto" DSL to binary ("protoc"), load the binary into the object-model ("protobuf-net"), write as xml (XmlSerializer) , run through xslt to get C#.

The alternative might be to run the xml through an xslt into C# and compile it, but... ugly. I've done this myself when absolutely needed; it was horrible enough to break reflector! (no, really).

My first response is: WHY??? An XML file of 40 MB is already huge. Why even store more data inside it? A good way to handle this much data would be by using a database. SQL Server Express is free to install and will work much faster. If you don't want a full server, the Compact edition of SQL Server might be an option, since it basically allow XCopy deployment.

The only advantage of XML is that it's readable for both machines and humans. With a binary format you will need some additional tool to make it human-readable.

Since you're using C#, I'd just go for the SQL Server Compact edition, with an SQL script that adds plenty of logical relations and constraints on the database. An additional Entity Framework class will make the data even more accessible and the only thing you'd need in some XML configuration file would the the connection string...

But if you're stuck with this XML file, the use of ZLIB has already been suggested to compress the whole file.

And since you're dealing with lots of small configuration files inside a bigger structure, you could -as suggested- use ZLIB to create a ZIP file that contains all those small XML structures as separate files. The filename in the ZIP file would be identifying the class that they're for and by reading the specific XML file from the ZIP file, you will improce performance, since the XML parser only needs to parse a little bit. Even if you would need to read 90% of all those XML files, performance would still be good since you're using lots of small XML documents, where the indices are smaller and searching will take less time.

The idea is to write the data in xml but transform that xml into a bytestream as a build step. You can do it by loading the xml into an in-memory object and then do a binary serialization of that object to a file for example. In production just do a binary deserialization and skip the xml altogether.

If you want to speed up the loading process, compressing the XML is not going to help you. In fact, it will hurt you: instead of simply parsing the XML, your program will have to uncompress it and then parse it.

You really haven't provided very much information about what you're currently doing. Are you currently loading the XML into an XmlDocument or XDocument and then processing it? If so, the simplest way to speed up the load without changing anything else is to implement a load method that uses an XmlReader, which lets you parse and deserialize the data at the same time.

Are you using XML serialization to produce the XML? If so, you can use protocol buffers, as Marc Gravell suggested, or you can implement binary serialization. This assumes that you don't need the XML for any other purpose.

Do you actually need to deserialize all of the configuration data before your program can function? Or is it possible to use some kind of lazy loading method? If you can do lazy loading, choosing some serialization format that lets you break the loading process into chunks that get performed when the program needs them can speed up the apparent performance of your program (if not the actual performance).

I guess the bottom line is: there are dozens of possible approaches to a problem that's defined as "I need to load a lot of data out of an XML document at startup." Define the problem more precisely, and you'll get more useful suggestions.

Ever thought of using a Resource file for this instead of your own home-rolled XML file? This is pretty much what they're made to do.

I ended up using zlib to create a compressed copy of an XML and XSD file in binary format.

If you are looking to turn the XML into some sort of object structure you can hit it from one of two sides. First you could create a XSD for the XML if you are mostly using nodes in the XML such as and then use the XSD.exe tool to generate the code to serialize/deserialize this. The Second option would be to have simple POCO objects setup that match the format of the XML and just use the XmlSerializer to turn the XML into the objects.

VTD-XML has the built-in indexing feature called vtd+xml, the basic idea is that you parse XML into VTD, then persist the VTD along with XML into an indexing file... next time you load up the indexed XML document, you don't have to parse it, which speeds up parsing significantly... see the article below

http://www.codeproject.com/KB/XML/VTD-XML-indexing.aspx

继续阅读：xml

How to convert XML Data into a binary deliverable?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？