Javascript regular expression that ignores a substring
Background:
I found similiar S.O. posts on this topic, but I failed to make it work for my scenario. Appologies in advance if this is a dupe.
My Intent:
Take every English word in a string, and convert it to a html hyperlink. This logic needs to ignore only the following markup: <br/>
, <b>
, </b>
Here's what I have 开发者_运维问答so far. It converts English words to hyperlinks as I expect, but has no ignore logic for html tags (that's where I need your help):
text = text.replace(/\b([A-Z\-a-z]+)\b/g, "<a href=\"?q=$1\">$1</a>");
Example Input / Output:
Sample Input:
this <b>is</b> a test
Expected Output:
<a href="?q=this">this</a> <b><a href="?q=is">is</a></b> <a href="?q=a">a</a> <a href="?q=test">test</a>
Thank you.
Issues with regexing HTML aside, the way I'd do this is in two steps:
- First of foremost, one way or another, extract the texts outside the tags
- Then only do this transform to these texts, and leave everything else untouched
Related questions
- Regex replace string but not inside html tag
- RegEx: Matching a especific string that is not inside in HTML tag
- regex - match not in tag
- RegEx to ignore / skip everything in html tags
- Text Extraction from HTML Java
Here's a hybrid solution that gives you the performance gain of innerHTML
and the luxury of not having to mess with HTML strings when looking for the matches:
function findMatchAndReplace(node, regex, replacement) {
var parent,
temp = document.createElement('div'),
next;
if (node.nodeType === 3) {
parent = node.parentNode;
temp.innerHTML = node.data.replace(regex, replacement);
while (temp.firstChild)
parent.insertBefore(temp.firstChild, node);
parent.removeChild(node);
} else if (node.nodeType === 1) {
if (node = node.firstChild) do {
next = node.nextSibling;
findMatchAndReplace(node, regex, replacement);
} while (node = next);
}
}
Input:
<div id="foo">
this <b>is</b> a test
</div>
Process:
findMatchAndReplace(
document.getElementById('foo'),
/\b\w+\b/g,
'<a href="?q=$&">$&</a>'
);
Output (whitespace added for clarity):
<div id="foo">
<a href="?q=this">this</a>
<b><a href="?q=is">is</a></b>
<a href="?q=a">a</a>
<a href="?q=test">test</a>
</div>
Here's another JavaScript method.
var StrWith_WELL_FORMED_TAGS = "This <b>is</b> a test, <br> Mr. O'Leary! <!-- What about comments? -->";
var SplitAtTags = StrWith_WELL_FORMED_TAGS.split (/[<>]/);
var ArrayLen = SplitAtTags.length;
var OutputStr = '';
var bStartWithTag = StrWith_WELL_FORMED_TAGS.charAt (0) == "<";
for (var J=0; J < ArrayLen; J++)
{
var bWeAreInsideTag = (J % 2) ^ bStartWithTag;
if (bWeAreInsideTag)
{
OutputStr += '<' + SplitAtTags[J] + '>';
}
else
{
OutputStr += SplitAtTags[J].replace (/([a-z']+)/gi, '<a href="?q=$1">$1</a>');
}
}
//-- Replace "console.log" with "alert" if not using Firebug.
console.log (OutputStr);
精彩评论