Regular Expression to Replace & with &
I have a string which have & like the one below.
"This R&M & Exapmle . It is very big & Complicated &146; example."
I wants to replace &
with &
but when I use $str =~ s/&/&/ig;
which give the following output开发者_运维问答.
"This R&M & Company  . It is very big & CMM Level3 &146; Organization."
And I am expecting this one.
"This R&M & Company . It is very big & CMM Level3 &146; Organization."
Please help me I don't have any idea how to fix it.
You can use a negative look-ahead assertion:
$str =~ s/&(?!\w+;)/&/g;
use HTML::Entities;
encode_entities decode_entities "This R&M & Exapmle . It is very big & Complicated &146; example."
# returns: "This R&M & Exapmle . It is very big & Complicated &146; example."
&146;
in written incorrectly for ’
. If you have more of these kind of mistakes, filter/substitute them before the round-trip encoding.
I found a better answer to this before and adopted the code posted, cradled it, and made it my own, but I can't appear to find that post anywhere.
Either way, here is the solution that I have made from it.
Now, the encoder currently only supports
, &
, "
, <
, and >
, but it is really easy to add support for more HTML Entities.
First of all, here is the Encoder:
var Encoder = {
encode: (function() {
var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
translate = {
'nbsp': String.fromCharCode(160),
'amp' : '&',
'quot': '"',
'lt' : '<',
'gt' : '>'
},
translator = function($0, $1) {
return translate[$1];
};
return function(s) {
if(typeof s === 'string')
return s.replace(translate_re, translator);
else
return s;
};
})(),
decode: (function() {
var reg_str = '(<|>|"|&|' + String.fromCharCode(160) + ')';
var translate_re = new RegExp(reg_str, 'g');
var translate = {
'&' : '&',
'"': '"',
'<' : '<',
'>' : '>'
};
translate[String.fromCharCode(160)] = ' ';
var translator = function($0, $1) {
return translate[$1];
};
return function(s) {
if(typeof s === 'string')
return s.replace(translate_re, translator);
else
return s;
};
})()
};
var Encoder = {
encode: (function() {
var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
translate = {
'nbsp': String.fromCharCode(160),
'amp' : '&',
'quot': '"',
'lt' : '<',
'gt' : '>'
},
translator = function($0, $1) {
return translate[$1];
};
return function(s) {
if(typeof s === 'string')
return s.replace(translate_re, translator);
else
return s;
};
})(),
decode: (function() {
var reg_str = '(<|>|"|&|' + String.fromCharCode(160) + ')';
var translate_re = new RegExp(reg_str, 'g');
var translate = {
'&' : '&',
'"': '"',
'<' : '<',
'>' : '>'
};
translate[String.fromCharCode(160)] = ' ';
var translator = function($0, $1) {
return translate[$1];
};
return function(s) {
if(typeof s === 'string')
return s.replace(translate_re, translator);
else
return s;
};
})()
};
//Here is our string with HTML entities in it
var str = 'Non-Breaking Space: " ", Ampersand: "&", Quote: """, Less-Than: "<", Greater-Than: ">"';
//Lets get our div's
var output_not_endcoded = document.getElementById("output_not_endcoded");
var output_endcoded = document.getElementById("output_endcoded");
//If this div exists, add the string with the HTML entities as is
if(output_not_endcoded)
output_not_endcoded.innerHTML = str;
//If the other div exists, decode the HTML entities and set it as its contents
if(output_endcoded)
output_endcoded.innerHTML = Encoder.decode(str);
* {
font: 13.2px "Courier New", Arial, sans-serif;
}
body {
font-size: 100%;
}
.row {
width:100%;
height:auto;
padding: 8px 6px;
}
With HTML Entities:
<div id="output_not_endcoded" class="row" ></div>
<br/>
With HTML Entities Decoded:
<div id="output_endcoded" class="row" ></div>
It is really easy to add support for other HTML entities.
Looking at the encoder, you will see our translation section. One part containing regex and another part containing our translation fields.
Regex:
var translate_re = /&(nbsp|amp|quot|lt|gt);/g
Translations:
translate = {
'nbsp': String.fromCharCode(160),
'amp' : '&',
'quot': '"',
'lt' : '<',
'gt' : '>'
}
Let's say that you wanted to add support for the copyright symbol "©." The entity name for this symbol is ©.
To add the support for this symbol, simply add it to the regex and translation:
Regex:
var translate_re = /&(nbsp|amp|quot|lt|gt|copy);/g
Translations:
translate = {
'nbsp': String.fromCharCode(160),
'amp' : '&',
'quot': '"',
'lt' : '<',
'gt' : '>',
'copy': '©',
}
You will need to make sure that you add the support to both the encode and decode functions if you want full support back and forth for the encoding and decoding.
And that's it! I hope that helped!
Update regex to change ampersand sign with a negative look-ahead to avoid changing HTML entities
&(?!(#[0-9]{2,4}|[A-z]{2,6});)
精彩评论