开发者

regex to detect MathML in textarea in JavaScript

I'm using CodeMirror and trying to create my own version of the mode-changing demo. I have a <textarea> on which I listen for changes and when there is a change, I want to look at the value in the <textarea> and determine if it is in the form of MathML.

So I just need to a very crude/hackish way to detect if the value in the <textarea> is MathML; it doesn't have to be perfect. I'm thinking that I can run a regex when the <textarea> changes and look for any of the following tags:

<mfrac>
<msup>
<msub>
<msqrt>
<mroot>
<mfenced>
<msubsup>
<munderover>
<munder>
<mtable>
<开发者_如何学JAVA;mtr>
<mtd>
<mrow>
<mi>
<mo>

I need to take the string from the <textarea> and look if any of these tags are a substring. How would I write this regex?


/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b/

will identify the start of any such tag.

To find a whole well-formed tag, you need to look for attributes before the closing > which is tougher. Something like

/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b[^>]*>/

is not guaranteed to match a whole tag, but will make sure there is a > after the start of the tag.

EDIT:

... what is /]*>/?

The regular expression has to be inside /.../ the same way a string has to be inside quotes because that is how the JavaScript interpreter tells a regular expression literal from a string or a number or any other kind of token.

The <m matches the first two characters of any mathml tag. The (?: and ) form a non-capturing group. It's the same as parentheses in an arithmetic expression. In the same way you have to use parentheses in (a + b) * (c + d) I use parentheses above to distinguish <m(?:frac|sup) from <mfrac|sup. The latter would match both "<mfrac" and "sup" without a <m before it.

The \b at the end is a word break. It says that there shouldn't be another word character after the name. So <msub\b matches "<msub" but not "<msubmarine".

The [^>]* bit matches any number of characters other than '>'. The [...] is a character set, so [a-z] matches any lower-case roman letter. The ^ negates it, so [^a-z] matches any character that is not a lower-case roman letter.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜