PHP/Perl Regular expression help!
I have a string:
$string = "This is my big <span class="big-string">string</span>";
I cannot figure out how to write a regular expression that will replace the 'b' in 'big' without replacing the 'b' in 'big-string'. I need to replace all occurances of a substring except when that substring appears in an html tag.
Any help is appreciated!
Edit
Maybe some more info will help. I'm working on an autocomplete feature that highlights whatever you're searching for in the current result set. Currently if you h开发者_如何学Pythonave typed 'aut' in the search dialog, then the results look like this: automotive
The problem appears when I search for 'auto b'. First I replace all occurrences of 'auto' with '<b>auto</b>
' then I replace all occurrences of 'b' with '<b>b</b>
'. Unfortunately this second sweep changes '<b>auto</b>
' to '<<b>b</b>>auto</<b>b</b>>
'
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, walk over the text nodes and do a simple str_replace
. You'll thank me around debugging time.
Is there a guarantee that 'big' won't be immediately preceded by "
? If so, then s/([^"])b/$1foo/
should replace the b
in question with foo
.
If you insist upon using a regex, this one will do a pretty decent job:
$re = '/# (Crudely) match a sub-string NOT in an HTML tag.
big # The sub-string to be matched.
(?= # Assert we are not inside an HTML tag.
[^<>]* # Consume all non-<> up to...
(?:<\w+ # either an HTML start tag,
| $ # or the end of string.
) # End group of valid alternatives.
) # End "not-in-html-tag" lookahead assertion.
/ix';
Caveats: This regex has very real limitations. The HTML must not have any angle brackets in the tag attributes. This regex also finds the target substring inside other parts of the HTML file such as comments, scripts and stylesheets, and this may not be desirable.
精彩评论