Regex to find URL within text and make them as link
I am trying to output a string as html which has links in it. I want to make these links actual links. My test string ="https://www.google.com http://yahoo.com www.msn.com www.google.com" My code:
Dim oRegEx As New Regex("((https?:\/\/|www\.)([-\w\.]+)+(:\d+)?(\/([\w\/_\.]*(\?\S+)?)?)?)", RegexOptions.IgnoreCase)
Dim matches As MatchCollection = oRegEx.Matches(sTextToConvert)
For Each match As Match In matches
If (match.Value.StartsWith("www.")) Then
sTextToConvert = sTextToConvert.Replace(match.Value, "<a href='http://" & match.Value & "' target=""_blank"">" & match.Value & "</a>")
Else
sTextToConvert = sTextToConvert.Replace(match.Value, "<a href='" & match.Value & "' target=""_blank"">" & match.Value & "</a>")
End If
Next
Return sTextToConvert
My issue here is, since www.google.com is twice in the string, when I do the replace it replaces a part of my already replaced string "https://www.google.com.
Here is what I get after the replace
<a href='https://<a href='http://www.google.com' target="_blank">www.google.com</a>' target="_blank">https://<a href='http://www.google.com' target="_blank">www.google.com</a></a> <a href='http://yahoo.com' target="_blank">开发者_Go百科http://yahoo.com</a> <a href='http://www.msn.com' target="_blank">www.msn.com</a> <a href='http://www.google.com' target="_blank">www.google.com</a>
Found a killer soultion
I just use and it takes care of all the links.
Return Regex.Replace(sTextToConvert, "((https?:\/\/|www.)([-\w.]+)+(:\d+)?(\/([\w\/_.]*(\?\S+)?)?)?)", "$0")
This is not a trivial task!
In fact the overlord of this site wrote a blog post on this subject. See: The Problem With URLs. (But to get the gist and scope of the problem, you really need to read the entire comment thread.) Here's the comment I made there (too late) which is applicable here:
I've been working diligently on this (interesting and challenging) problem and have come up with a pretty decent single regex solution (PHP and Javascript). It correctly handles: delimited URLs (in (parentheses), [square brackets], <angle brackets>, {curly braces}, 'single quotes' and "double quotes"), skips over already linked URLs (in both HTML and BBCode syntaxes), properly excludes trailing punctuation (even when mixed with quotes), and is written using no complex regex constructs (i.e. no lookbehind so it works in Javascript.) It also properly handles delimiters in HTML entity form.
I've released the Javascript and PHP scripts as open source and anyone interested can download them from Github: "LinkifyURL". Here is a link to the Javascript test page that demonstrates the Javascript version and provides a detailed commented listing of the regex used by both scripts: URL Linkification (HTTP/FTP)
The regex is rather complex (but so is this problem, as it turns out). A RegexBuddy library file is included as part of the Github project if you are into that.
Also take a look at John Gruber's: An Improved Liberal, Accurate Regex Pattern for Matching URLs. His regex is pretty darn good (but it does suffer from catastrophic backtracking under certain conditions - i.e. when a url has nested parentheses and the inner parentheses are empty.)
Matches
is only intended to retrieve parts of a string.
Use Replace
instead. One of the arguments it takes is a function to transform a matched string into a replacement string (see the example there).
精彩评论