开发者

RegEx Remove Internal Intranet Links from String .NET

I'm looking for a way to remove all references to internal intranet sites in a string while retaining the label.

For instance:

Dim str As String = Nothing
str &= "<a href=""http://intranet/somepage.asp"">Internal Page</a>"
str &= "<a href=""http://www.external.com"">External Pa开发者_如何学Cge</a>"

Anything that references http://intranet would be considered internal and need to be parsed and removed with regex.

I appreciate your help.

Thanks


While it's not a regex solution, it's just as simple. Given your two examples above, you could do the following:

Private Function IntranetCheck(ByVal link As String) As String
    If link.ToLower().Contains("http://intranet/") Then
        Return link.Split(">")(1).Split("<")(0)
    Else
        Return link
    End If
End Function

Usage:

Dim str As String = Nothing
str &= IntranetCheck("<a href=""http://intranet/somepage.asp"">Internal Page</a>") 
str &= IntranetCheck("<a href=""http://www.external.com"">External Page</a>")

This will check if the passed in string contains the intranet address, and if it does, it will split out the string to only return the inner text of the element.


I highly recommend using the Html Agility Pack for this kind of processing. Using this tool, you can do something like this:

HtmlDocument doc;
doc.Load(fileName);
foreach(HtmlNode anchor in doc.DocumentNode.Descendants("a").Where(n => n.GetAttributeValue("href", string.Empty).Contains("intranet")))
{
    // Change your href attribute here
    string newHref = anchor.GetAttributeValue("href", string.Empty).Replace("intranet", "somethingelse");
    anchor.SetAttributeValue("href", newHref);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜