Convert text link to HTML with context considered
I want to co开发者_StackOverflow中文版nvert links such as http://google.com/ to HTML, however if they're already in an HTML link, either in the href="" or in the text for the link, I don't want to convert them.
I found this in another question:
preg_replace('@(https?:\/\/([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $text);
However if I have something such as:
<a href="http://google.com/">http://google.com/</a>
already in the target text in question, it will create two links within that HTML. I can't seem to figure out the pattern for knowing if it's before /a or inside " ".
Do not use regular expressions for (X)HTML parsing. Use DOM instead!
The XPath //text()[not(ancestor::a) and contains(., 'http://')][1]
should find the first text node containing at least one HTTP URL that is not itself contained in an anchor tag. You may naively replace the text node with a text node containing preceding text, an anchor element node containing href attribute and href text node, and a text node containing remaining text. Do that until you find no more text nodes matching the XPath.
Based on mario's comment to my original post:
preg_replace('@(?<!href="|src="|">)(https?:\/\/([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1">$1</a>', $text);
Works perfectly for replacing bbpress's unknown pasta salad.
精彩评论