开发者

C# replace multiple href values

I have a block of html that looks something like this;

<p><a href="docs/123.pdf">33</a></p>

There are basically hundreds of anchor links which I need to replace the href based on the anchor text. For example, I need to replace the link 开发者_JAVA技巧above with something like;

<a href="33.html">33</a>. 

I will need to take the value 33 and do a lookup on my database to find the new link to replace the href with.

I need to keep it all in the original html as above!

How can I do this? Help!


Although this doesn't answer your question, the HTML Agility Pack is a great tool for manipulating and working with HTML: http://html-agility-pack.net

It could at least make grabbing the values you need and doing the replaces a little easier.

Contains links to using the HTML Agility Pack: How to use HTML Agility pack


Slurp your HTML into an XmlDocument (your markup is valid, isn't it?) Then use XPath to find all the <a> tags with an href attribute. Apply the transform and assign the new value to the href attribute. Then write the XmlDocument out.

Easy!


Use a regexp to find the values and replace A regexp like "/<p><a herf=\"[^\"]+\">([^<]+)<\\/a><\\/p> to match and capture the ancor text


Consider using the the following rough algorithm.

using System;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

static class Program
{
  static void Main ()
  {
    string html = "<p><a href=\"docs/123.pdf\">33</a></p>"; // read the whole html file into this string.
    StringBuilder newHtml = new StringBuilder (html);
    Regex r = new Regex (@"\<a href=\""([^\""]+)\"">([^<]+)"); // 1st capture for the replacement and 2nd for the find
    foreach (var match in r.Matches(html).Cast<Match>().OrderByDescending(m => m.Index))
    {
       string text = match.Groups[2].Value;
       string newHref = DBTranslate (text);
       newHtml.Remove (match.Groups[1].Index, match.Groups[1].Length);
       newHtml.Insert (match.Groups[1].Index, newHref);
    }

    Console.WriteLine (newHtml);
  }

  static string DBTranslate(string s)
  {
    return "junk_" + s;
  }
}

(The OrderByDescending makes sure the indexes don't change as you modify the StringBuilder.)


So, what you want to do is generate the replacement string based on the contents of the match. Consider using one of the Regex.Replace overloads that take a MatchEvaluator. Example:

static void Main()
{
  Regex r = new Regex(@"<a href=""[^""]+"">([^<]+)");

  string s0 = @"<p><a href=""docs/123.pdf"">33</a></p>";
  string s1 = r.Replace(s0, m => GetNewLink(m));

  Console.WriteLine(s1);
}

static string GetNewLink(Match m)
{
  return string.Format(@"(<a href=""{0}.html"">{0}", m.Groups[1]);
}

I've actually taken it a step further and used a lambda expression instead of explicitly creating a delegate method.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜