Regex: absolute url to relative url (C#)
I need a regex to run against strings like the one below that will convert absolute paths to relative paths under certain conditions.
<p>This website is <strong>really great</strong> and people love it <img alt="" src="http://localhost:1379/Content/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif" /></p>
Rules:
If the url contains "/Content/" I would like to get the relative path
If the url does not contain "/Content/", it is an external file, and the absolute path should remain
Regex unfortunatley is not my forte, and this is too advanced for me at this point. If anyone can offer some tips I'd appreciate it.
Thanks in advance.
UPDATE: To answer questions in the comments:
- At the time the Regex is applied, All urls will begin with "http://"
- This should be applied to the src attribute of both im开发者_开发百科g and a tags, not to text outside of tags.
You should consider using the Uri.MakeRelativeUri method - your current algorithm depends on external files never containing "/Content/" in their path, which seems risky to me. MakeRelativeUri will determine whether a relative path can be made from the current Uri to the src
or href
regardless of changes you or the external file store make down the road.
Unless I'm missing the point here, if you replace
^(.*)([C|c]ontent.*)
With
/$2
You will end up with
/Content/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif
This will only happen id "content" is found, so in cae you have a URL such as:
http://localhost:1379/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif
Nothing will be replaced
Hope it helps, and that i didn't miss anything.
UPDATE
Obviously considering you are using an HTML parser to find the URL inside the a href (which you should in case you're not :-))
Cheers
That is for perl, I do not know c#:
s@(<(img|a)\s[^>]*?\s(src|href)=)(["'])http://[^'"]*?(/Content/[^'"]*?)\4@$1$4$5@g
If c# has perl-like regex it will be easy to port.
This function can convert all the hyperlinks and image sources inside your HTML to absolute URLs and for sure you can modify it also for CSS files and Javascript files easily:
Private Function ConvertALLrelativeLinksToAbsoluteUri(ByVal html As String, ByVal PageURL As String)
Dim result As String = Nothing
' Getting all Href
Dim opt As New RegexOptions
Dim XpHref As New Regex("(href="".*?"")", RegexOptions.IgnoreCase)
Dim i As Integer
Dim NewSTR As String = html
For i = 0 To XpHref.Matches(html).Count - 1
Application.DoEvents()
Dim Oldurl As String = Nothing
Dim OldHREF As String = Nothing
Dim MainURL As New Uri(PageURL)
OldHREF = XpHref.Matches(html).Item(i).Value
Oldurl = OldHREF.Replace("href=", "").Replace("HREF=", "").Replace("""", "")
Dim NEWURL As New Uri(MainURL, Oldurl)
Dim NewHREF As String = "href=""" & NEWURL.AbsoluteUri & """"
NewSTR = NewSTR.Replace(OldHREF, NewHREF)
Next
html = NewSTR
Dim XpSRC As New Regex("(src="".*?"")", RegexOptions.IgnoreCase)
For i = 0 To XpSRC.Matches(html).Count - 1
Application.DoEvents()
Dim Oldurl As String = Nothing
Dim OldHREF As String = Nothing
Dim MainURL As New Uri(PageURL)
OldHREF = XpSRC.Matches(html).Item(i).Value
Oldurl = OldHREF.Replace("src=", "").Replace("src=", "").Replace("""", "")
Dim NEWURL As New Uri(MainURL, Oldurl)
Dim NewHREF As String = "src=""" & NEWURL.AbsoluteUri & """"
NewSTR = NewSTR.Replace(OldHREF, NewHREF)
Next
Return NewSTR
End Function
精彩评论