How to derive DTD (or other XML spec format) from XML file samples
Do you know of a tool that will derive a DTD (or other 开发者_StackOverflow社区XML structure specification format) from a sample set of XML files?
Currently the only (automatic) validation we have for an xml encoded DSL is a legacy parser written in Perl, but for consistency reasons all perl code must be ported to C-sharp.
You can use xsd.exe
(part of visual studio) to generate an XML schema for a given XML file.
http://www.stylusstudio.com/dtd_generator.html is actual software implementing a DTD generator.
http://www.pmg.csail.mit.edu/~chmoh/pubs/wecwis.pdf seems like a nice paper on the kind of thing you'd need, but I can't find (links to) actual code anywhere in the paper so far.
Here's another paper on this, again, no code to be found: http://www.softnet.tuc.gr/~minos/Papers/debull03.pdf.
Finally, I'd also suggest you look into using RELAX NG or Schematron to validate your XML instead. Those languages are much more expressive, making them easier to read and more powerful in the kinds of things you can validate. (Be sure to skip XML Schema, which is widely considered to be a mess.)
You can use the following link for generating schema online, by providing just the xml data. http://www.xmlforasp.net/codebank/system_xml_schema/buildschema/buildxmlschema.aspx
You can download JetBrains IDEA community edition which is free. It has built-in tools for generating GTDs and Schemas:
http://www.jetbrains.com/idea/webhelp/generating-dtd.html
Maybe not perfect but it is something.
Here is the program that worked for me DTDGenerator. You need to compile it with Java, but it works well. I am surprised by the lack of free software for a language that has been around for a long time, but this one is free under Mozilla Public License Version 1.0.
Altova's XMLSpy has a DTD/XML Schema generator.
The generated DTD/XML Schema usually requires a little tweaking. For example, the tool may enumerate a list of attributes or elements, when you "meant" for it to allow any value. You're only giving it a sample of your problem space, and it has to go from specific to general, though. For that reason, I don't get too bent out of shape when it fails to read my mind.
I consider the generated dtd or schema a starting point. It's better than rolling it by hand from zero. Er, if you're starting with existing XML documents, that is.
Even if you're not going to use the generated dtd, it's a pretty good way to get your head around the structure of a set of unfamiliar XML documents.
XMLMax editor will create an XSD from an XML file. The free trial(no registraton/small download file) will do this for you. If you want to do this in code, .NET framework has an XmlSchemaInference class that automatically creates an XSD from an xml file.
Just used http://www.freeformatter.com/xsd-generator.html to generate an xsd from an xml file. It also has a lot of other formatting possibilities!
You may want to try Trang
or Instance to Schema Tool
(part of XMLBeans).
I put them into a test with 1GB XML file. Here are the results:
Trang
:
max memory [kB] - 98,480
time [MM:SS] - 0:24
Instance to Schema Tool
:
max memory [kB] - 5,993,240
time [MM:SS] - 7:36
精彩评论