Finding the xml escaped dash in Javascript
I want to use regular expression to find dashes in an html in javascript. The dashes in html pag开发者_如何学Pythones sometimes may be xml escaped with the string value of –
. However, using regular expression to find this string is not working for some reason.
var html = document.getElementsByTagName('html').item(0).innerHTML;
var escapedDash = /–/ig;
var foundEscapedDash = html.match(escapedDash);
alert(foundEscapedDash);
The regular experession, /–/ig
does not result in any values. Nor does the regular expression /-/i
find the escaped dash –
Does anyone know of a regular expression that can find the escaped dash?
When you set innerHTML
to a string with an entity, it converts it to the literal character. For example:
var div = document.createElement('div');
div.innerHTML = '–'
alert(div.innerHTML.length); // 1, not 7 as may be expected
So you need to match the actual character &ndash
, and to do that, you can use the unicode literal representation. For "–", it's \u2013
.
div.innerHTML.match(/\u2013/ig)
By the way, assuming the dash is the first character of the string, you can find the hex number 0x2013
for yourself with div.innerHTML.charCodeAt(0).toString(16)
.
Try this:
var str = '–hello world –';
var escapedDash = /(–+)/ig;
var foundEscapedDash = str.match(escapedDash);
alert(foundEscapedDash);
精彩评论