Bi-directional Text Parsing Recommendations
I'm looking at the feasability of implementing a bi-directional text parsing framework to allow formatted text to be processed using a combination of common paradigms su开发者_如何学JAVAch as Markdown, BBCode, DocuWiki, and so on. Practically speaking this means that each implentation must be able to translate to and from a common format. That could be HTML, but more realistically an intermediate (more easily parsable) format like XML or YAML.
This will probably utilize a tokenizer to break the document into it's relevant components. Does this sound like the best approach and can you forsee any significant roadblocks?
Lastly, is anyone aware of an existing implementations (or attempts).
Note that this is focused on PHP, but other solutions are welcome.
Have a look at the source of an HTML parser such as Nokogiri, Hpricot, BeautifulSoup etc. They will give you some food for thought on constructing a structured text parser.
There's probably no need to translate to an intermediate format, since your tokenised object tree is going to be all you need to build all the output formats.
If you have specific implementation questions, you should post them too.
精彩评论