RegEx Remove Internal Intranet Links from String .NET
I'm looking for a way to remove all references to internal intranet sites in a string while retaining the label.
For instance:
Dim str As String = Nothing
str &= "<a href=""http://intranet/somepage.asp"">Internal Page</a>"
str &= "<a href=""http://www.external.com"">External Pa开发者_如何学Cge</a>"
Anything that references http://intranet would be considered internal and need to be parsed and removed with regex.
I appreciate your help.
Thanks
While it's not a regex solution, it's just as simple. Given your two examples above, you could do the following:
Private Function IntranetCheck(ByVal link As String) As String
If link.ToLower().Contains("http://intranet/") Then
Return link.Split(">")(1).Split("<")(0)
Else
Return link
End If
End Function
Usage:
Dim str As String = Nothing
str &= IntranetCheck("<a href=""http://intranet/somepage.asp"">Internal Page</a>")
str &= IntranetCheck("<a href=""http://www.external.com"">External Page</a>")
This will check if the passed in string contains the intranet address, and if it does, it will split out the string to only return the inner text of the element.
I highly recommend using the Html Agility Pack for this kind of processing. Using this tool, you can do something like this:
HtmlDocument doc;
doc.Load(fileName);
foreach(HtmlNode anchor in doc.DocumentNode.Descendants("a").Where(n => n.GetAttributeValue("href", string.Empty).Contains("intranet")))
{
// Change your href attribute here
string newHref = anchor.GetAttributeValue("href", string.Empty).Replace("intranet", "somethingelse");
anchor.SetAttributeValue("href", newHref);
}
精彩评论