How do you build an XML parser?
Can anyone direct me to a good tutorial in building an XML parser? I realize most languages already have libraries to do this task, but I'm开发者_StackOverflow interested in learning about the grammar of XML and the theory behind how parsers work. I've tried searching for something that explains this but have been unable to find anything.
Just to make it clear, you should NEVER EVER try to write an XML parser to be used in production. This is
- way to complex for most people and really, really hard to get right and
- a solved problem in about any language.
For getting an overview of XML, I propose you read "XML In A Nutshell" on O'Reilly and just try to do stuff with XML and XML transformations. For general parser building, Parsing Techniques looks really promising. But actually parsing XML is rather hard, so you should probably start by getting knowledge by using it. Also documentation is much less sparse in that area...
I think there isn't enough demand for people to write such tutorials; and as I commented, I don't think general parser techniques are of much help. XML parsers are not something usual lex+yacc approach works too well (lexer part more than parser, for what that's worth).
I know most production ready XML parsers are beasts, but you might be best off starting reading one. Java has a few examples, and xmlpull might be amongst simplest proper parsers. Woodstox and Xerces are the most compliant ("full") parsers, with large codebase, so definitely not light reading. But they handle everything XML parser should, so they might be educational too. But beware half-backed fake parsers that skip checks for things XML specification mandates (Javolution for example checks very few things, for example none of character validity checks, or attribute name duplications).
Another thing to read is obviously XML specification. It is one of most well-written specifications IMO; accurate and complete, even if not exactly light reading. But considering all it covers, it's actually not all that long.
If you're a student of computer science and fancy writing an XML parser as an academic exercise, then fine: it's a good way to spend a wet weekend, and you don't need to ask the question because you have access to a library of textbooks on how to write parsers, and if you have specific XML-related problems then you can always look into the code of various open-source parsers to see how experts have tackled the problem.
If you're not a student of computer science then I would suggest you become one - the theory of how to write parsers for different classes of grammar is part of the foundation of the subject.
精彩评论