regex to detect MathML in textarea in JavaScript
I'm using CodeMirror and trying to create my own version of the mode-changing demo. I have a <textarea>
on which I listen for changes and when there is a change, I want to look at the value in the <textarea>
and determine if it is in the form of MathML.
So I just need to a very crude/hackish way to detect if the value in the <textarea>
is MathML; it doesn't have to be perfect. I'm thinking that I can run a regex when the <textarea>
changes and look for any of the following tags:
<mfrac>
<msup>
<msub>
<msqrt>
<mroot>
<mfenced>
<msubsup>
<munderover>
<munder>
<mtable>
<开发者_如何学JAVA;mtr>
<mtd>
<mrow>
<mi>
<mo>
I need to take the string from the <textarea>
and look if any of these tags are a substring. How would I write this regex?
/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b/
will identify the start of any such tag.
To find a whole well-formed tag, you need to look for attributes before the closing >
which is tougher. Something like
/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b[^>]*>/
is not guaranteed to match a whole tag, but will make sure there is a >
after the start of the tag.
EDIT:
... what is /]*>/?
The regular expression has to be inside /.../
the same way a string has to be inside quotes because that is how the JavaScript interpreter tells a regular expression literal from a string or a number or any other kind of token.
The <m
matches the first two characters of any mathml tag. The (?:
and )
form a non-capturing group. It's the same as parentheses in an arithmetic expression. In the same way you have to use parentheses in (a + b) * (c + d)
I use parentheses above to distinguish <m(?:frac|sup)
from <mfrac|sup
. The latter would match both "<mfrac"
and "sup"
without a <m
before it.
The \b
at the end is a word break. It says that there shouldn't be another word character after the name. So <msub\b
matches "<msub"
but not "<msubmarine"
.
The [^>]*
bit matches any number of characters other than '>'
. The [...]
is a character set, so [a-z]
matches any lower-case roman letter. The ^
negates it, so [^a-z]
matches any character that is not a lower-case roman letter.
精彩评论