JavaScript regex: Not starting with
I want to replace all the occurrences of a string that doesn't start with "<pre>
" and doesn't end in "</pre>
".
So let's say I wanted to find new-line characters and replace them with "<p/>
". I can get the "not followed by" part:
var revisedHtml = html.replace(/[\n](?![<开发者_开发技巧][/]pre[>])/g, "<p/>");
But I don't know the "not starting with" part to put at the front.
Any help please? :)
Here's how Steve Levithan's first lookbehind-alternative can be applied to your problem:
var output = s.replace(/(<pre>[\s\S]*?<\/pre>)|\n/g, function($0, $1){
return $1 ? $1 : '<p/>';
});
When it reaches a <pre>
element, it captures the whole thing and plugs it right back into the output. It never really sees the newlines inside the element, just gobbles them up along with all other content. Thus, when the \n
in the regex does match a newline, you know it's not inside a <pre>
element, and should be replaced with a <p/>
.
But don't make the mistake of regarding this technique as a hack or a workaround; I would recommend this approach even if lookbehinds were available. With the lookaround approach, the regex has to examine every single newline and apply the lookarounds each time to see if it should be replaced. That's a lot of unnecessary work it has to do, plus the regex is a lot more complicated and less maintainable.
As always when using regexes on HTML, I'm ignoring a lot of factors that can affect the result, like SGML comments, CDATA sections, angle brackets in attribute values, etc. You'll have to determine which among those factors you have to deal with in your case, and which ones you can ignore. When it comes to processing HTML with regexes, there's no such thing as a general solution.
Why not do the reverse. Look for all the substrings enclosed in <pre>
tags. Then you know which parts of your string are not enclosed in <pre>
.
EDIT: More elegant solution: use split()
and use the <pre>
HTML as the delimiters. This gives you the HTML outside the <pre>
blocks.
var s = "blah blah<pre>formatted</pre>blah blah<pre>another formatted</pre>end";
var rgx = /<pre>.*?<\/pre>/g
var nonPreStrings = s.split(rgx);
for (var idx in nonPreStrings)
alert(nonPreStrings[idx]);
精彩评论