StackOverflow Style A Href Auto Linking in Regex
I am using the below function to search for text links and convert them to a hyperlink. First of all is it correct? It appears to work but do you know of a (perhaps malformed) url that would break this function?
My question is whether it is possible to get this to support port numbers as well, for example stackoverflow.com:80/index
will not be converted as the port is not seen as a valid part of the url.
So in summary I am looking for Stackoverflow style url recognition, which I believe is a custom addition to Markdown.
/**
* Search for and create links from urls
*/
static public function autoLink($text) {
$pattern = "/(((http[s]?:\/\/)|(www\.))(([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+(\.[a-z]{2,2})?)\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is";
$text = preg_replace($pattern, " <a href='$1'>$1</a>",开发者_如何学编程 $text);
// fix URLs without protocols
$text = preg_replace("/href='www/", "href='http://www", $text);
return $text;
}
Thanks for your time,
You should also look at the answers to this question: How to mimic StackOverflow Auto-Link Behavior
I have ended up combining the answers I have got both at stack overflow and talking to colleagues. The below code is the best we could come up with.
/**
* Search for and create links from urls
*/
static public function autoLink($text) {
$pattern = "/\b((?P<protocol>(https?)|(ftp)):\/\/)?(?P<domain>[-A-Z0-9\\.]+)[.][A-Z]{2,7}(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,\\.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,\\.;]*)?/ise";
$text = preg_replace($pattern, "' <a href=\"'.htmlspecialchars('$0').'\">$0</a>'", $text);
// fix URLs without protocols
$text = preg_replace("#href='www#i", "href='http://www", $text);
$text = preg_replace("#href=['\"](?!(https?|ftp)://)#i", "href='http://", $text);
return $text;
}
Rather than writing your own autolinking routine, which is essentially the beginning of a custom markup engine, you might want to use an open source markup engine, as it is less likely to be vulnerable to cross-site scripting attacks. One example of an open source markup engine for PHP is PHP Markdown, which has the ability to autolink URLs and essentially uses the same Markdown syntax that is in use at Stack Overflow.
One note: you should always escape HTML special characters using htmlspecialchars()
before sticking the text into attributes or in the inner text of elements.
$pattern = "/\b(?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?/i";
will match:
http://www.scroogle.org/index.html
http://www.scroogle.org:80/index.html?source=library
精彩评论