regex to detect MathML in textarea in JavaScript

2023-03-19 07:38 问答作者：

I'm using CodeMirror and trying to create my own version of the mode-changing demo. I have a <textarea> on which I listen for changes and when there is a change, I want to look at the value in the <textarea> and determine if it is in the form of MathML.

So I just need to a very crude/hackish way to detect if the value in the <textarea> is MathML; it doesn't have to be perfect. I'm thinking that I can run a regex when the <textarea> changes and look for any of the following tags:

<mfrac>
<msup>
<msub>
<msqrt>
<mroot>
<mfenced>
<msubsup>
<munderover>
<munder>
<mtable>
<开发者_如何学JAVA;mtr>
<mtd>
<mrow>
<mi>
<mo>

I need to take the string from the <textarea> and look if any of these tags are a substring. How would I write this regex?

/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b/

will identify the start of any such tag.

To find a whole well-formed tag, you need to look for attributes before the closing > which is tougher. Something like

/<m(?:frac|sup|sub|sqrt|root|fenced|subsup|underover|under|table|tr|td|row|i|o)\b[^>]*>/

is not guaranteed to match a whole tag, but will make sure there is a > after the start of the tag.

EDIT:

... what is /]*>/?

The regular expression has to be inside /.../ the same way a string has to be inside quotes because that is how the JavaScript interpreter tells a regular expression literal from a string or a number or any other kind of token.

The <m matches the first two characters of any mathml tag. The (?: and ) form a non-capturing group. It's the same as parentheses in an arithmetic expression. In the same way you have to use parentheses in (a + b) * (c + d) I use parentheses above to distinguish <m(?:frac|sup) from <mfrac|sup. The latter would match both "<mfrac" and "sup" without a <m before it.

The \b at the end is a word break. It says that there shouldn't be another word character after the name. So <msub\b matches "<msub" but not "<msubmarine".

The [^>]* bit matches any number of characters other than '>'. The [...] is a character set, so [a-z] matches any lower-case roman letter. The ^ negates it, so [^a-z] matches any character that is not a lower-case roman letter.

继续阅读：javascript mathml pattern-matching regex

regex to detect MathML in textarea in JavaScript

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？