Find mathmls from a String using java
I have a Big string which has multiple mathmls in it. Want to take out all of them in a string array. Using regex to find them. But something missing in the regex so it doesn't gives any output.
What is the regex for MathMls?
Example string
Find sum of «math xmlns=\"http://www.w3.org/1998/Math/MathML\"»«mroot»«mrow»«mi»#«/mi»«mi»a«/mi»«/mrow»«mn»3«/mn»«/mroot»«mo»=«/mo»«mroot»«mrow»«mi»#«/mi»«mi»b«/mi»«/mrow»«mn»3«/mn»«/mroot»«/math» and «math xmlns=\"http://www.w3.org/19开发者_如何学JAVA98/Math/MathML\"»«mo»=«/mo»«msup»«mfenced»«mrow»«mi»#«/mi»«mi»b«/mi»«/mrow»«/mfenced»«mfrac»«mn»1«/mn»«mn»3«/mn»«/mfrac»«/msup»«/math»
From this get 2 mathmls
You can't do that with Java's regex engine since this is valid input:
<math>
<apply>
<plus/>
<apply>
<times/>
<ci>a</ci>
<apply>
<power/>
<ci>x</ci>
<cn>2</cn>
</apply>
</apply>
<apply>
<times/>
<ci>b</ci>
<ci>x</ci>
</apply>
<ci>c</ci>
</apply>
</math>
i.e.: there can be arbitrary nested tags and Java's regex engine has no ability to match recursive patterns. You will have to resort to some parser to handle MathML input.
EDIT
Can i consider the entire thing as a string and find for a pattern which matches ? That is what i am trying. And there is not going to be any recursive tags inside another tag. they will be in same level.
In that case, try this pattern:
<math[>\s](?s).*?</math>
or as a String literal:
"<math[>\\s](?s).*?</math>"
which means:
<math[>\s] # match `<math` followed by a space or `>`
(?s).*? # reluctantly match zero or more chars (`(?s)` causes `\r`
# and `\n` also to be matched)
</math> # match `</math>`
精彩评论