Simple XML parser in bison/flex
I would like to create simple xml parser using bison/flex. I don't need validation, comments, arguments, only <tag>value</tag>
, where value can be number, string or other <tag>value</tag>
.
So for example:
<div>
<mul>
<num>20</num>
<add>
<num>1</num>
<num>5</num>
</add>
</mul>
<id>test</id>
</div>
If it helps, I know the names of all tags that may occur. I know how many sub-tag can be hold by given tag. Is it possible to create bison parser that would do something like that:
- new Tag("num", 1) // tag1
- new Tag("num", 5) // tag2
- new Tag("add", tag1, tag2) // tag3
- new Tag("num", 20) // tag4
- new Tag("mul", tag4, tag3)
...
- root = top_tag
Tag & number of sub-tags:
- num: 1 (only value)
- str: 1 (only value)
- add | sub | mul | div: 2 (num | str | tag, num | str | tag)
Could you help me with grammar to be able to create AST like开发者_StackOverflow中文版 given above?
For your requirements, I think the yax system would work well. From the README:
The goal of the yax project is to allow the use of YACC (Gnu Bison actually) to parse/process XML documents.
The key piece of software for achieving the above goal is to provide a library that can produce an XML lexical token stream from an XML document.
This stream can be wrapped to create an instance of yylex() to feed tokens to a Bison grammar to parse and process the XML document.
Using the stream plus a Bison grammar, it is possible to carry at least the following kinds of activities.
- Validate XML documents,
- Directly parse XML documents to create internal data structures,
- Construct DOM trees.
I do not think that it's the best tool to use to create a xml parser. If I have to do this job, I'll do it by hand.
Flex code will contains : NUM match integer in this example. STR match match any string which does not contains a '<' or '>'. STOP match all closing tags. START match starting tags.
<\?.*\?> { ;}
<[a-z]+> { return START; }
</[a-z]+> { return STOP; }
[0-9]+ { return NUM; }
[^><]+ { return STR; }
Bison code will look like
%token START, STOP, STR, NUM
%%
simple_xml : START value STOP
;
value : simple_xml
| STR
| NUM
| value simple_xml
;
精彩评论