开发者

How does a sax parser validate against a dtd?

I have a xml file and a dtd defined. My understanding of a sax parser is it processes events instead of storing the entire xml document (like DOM) in memory. Say, I have a xml file with declaration like < name> ... // some 开发者_StackOverflow中文版2 million lines here < /name> .. So, what will the sax parser store in memory in this case? How does it know that the end-tag name will occur. And now the real question, how does a sax parser validate against a dtd ? I am not looking for an in-depth explanation but just the general idea on how validation occurs.


Typically the DTD is converted into a set of finite state automata - there's a standard algorithm for converting a BNF grammar to a deterministic FSA which is found in compiler textbooks such as Aho and Ullmann. This will produce one FSA for the content model of each element. The current state of parsing/validation is thus represented by a stack holding one FSA (with its current state) for each open element. When the parser encounters a start tag, it checks whether that start tag represents a valid transition in the topmost FSA, and changes the current state in that FSA by making this transition; it also adds a new FSA to the stack corresponding to the FSA for the content model of the new element. When it sees an end tag, it checks whether the current state of the topmost FSA is a final state, and pops this FSA off the stack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜