wikitext to xml
Is there a way to convert wikitext data into simple XML in a Java application?
Input example:
== A section ==
this is some text...
{{MyTemplate
|attr1=some value
|attr2=some other value
...
Output example:
<section title='A section'>this is some text...</section>
<ValueDescription attr1='some value' attr2='some开发者_StackOverflow other value' ...>
It seems like a trivial task but I couldn't find a library to do it in Java.
Mulone
XML has a tree structure, wikitext for the most part does not. E. g. this is fully legal:
== A section {{DoubleEqual{{echo|Sign}}}}
The template syntax itself is hierarchical, and MediaWiki itself transforms it to XML (you can use Special:ExpandTemplates to check it out), but the rest of the syntax is much too loose for XML or other formal descriptions like a context-free grammar.
There is a rewrite effort going on to turn wikitext into a standard, parseable language, but don't expect it to end anytime soon.
http://sweble.org/wiki/Wikitext-parser/ they have a properly done parser, but I think there is no XML output for the AST yet.
@Tgr: Syntactically it is not really compatible with a Tree but semantically it is.
And yes, handling Wikitext is a huge mess.
精彩评论