开发者

Regex To Match &entity; or &#0-9; And Capture &

I'm trying to do a replace on the following string prototype: "I‘m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

开发者_开发百科Edit: I should add that I don't want to match "&" by itself.


look for (this copes with named, decimal and hexadecimal entities):

&([A-Za-z]+|#x[\dA-Fa-f]+|#\d+);

replace with

&$1;

Be warned: This has a real probability to go wrong. I recommend using a HTML parser to decode the text. You can decode it twice, if it was double encoded. HTML and regex don't play well together even on the small scale.

Since you are in JavaScript, I expect you are in a browser. If you are, you have a nice DOM parser at your hands. Create a new element, assign the string to its inner HTML property and read out the text value. Done.


I gather that you want to match &, but only if it is followed by an alphanumeric character or certain punctuation. That calls for lookahead. This regular expression should match what you want without capturing or consuming any additional characters.

(&)(?=[#?a-zA-Z0-9;])


Actually you're matching the string &l but captured is only the &. This is because of the character class after the capture group which will match an additional character.

But your original regex is a little flawed to begin with anyway. A (not optimal) replacement might be:

&(#[0-9]+|#x[0-9a-zA-Z]+|[a-zA-Z]+);

which will match the complete entity or character declaration and capture the &.


If you only want to match &, why did you include the character class [#?a-zA-Z0-9;] as well?

In english, your expression would be "Match & followed by a character that is #, ?, a lowercase letter, an uppercase letter or ;".

Just use (&)


You probably meant:

"&([#a-zA-Z0-9]+;)"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜