开发者

Find images that have a certain HTML class name

I have some markup that contains certain HTML image tags with the class featured. What I need is to find 开发者_StackOverflow中文版all those images, add an anchor tag around the image, set the href attribute of the anchor to the images src value (the image path), and lastly replace the images src value with a new value (I call a method that will return this value).

<p>Some text here <img src="/my/path/image.png" alt="image description" class="featured" />. Some more text and another image that should not be modified <img src="/my/path/image2.png" alt="image description" /></p>

Should become.

<p>Some text here <a href="/my/path/image.png"><img src="/new/path/from/method.png" alt="image description" class="featured" /></a>. Some more text and another image that should not be modified <img src="/my/path/image2.png" alt="image description" /></p>


Don't use RegEx to parse HTML. See this classic SO answer for the reasons.

Use the HTML Agility Pack instead - you can use XPath to query your HTML.


Ended up with this code.

using System;

using System.Reflection; using HtmlAgilityPack; using log4net;

namespace Company.Web.Util { public static class HtmlParser { private static readonly ILog _log = LogManager.GetLogger(MethodBase.GetCurrentMethod().DeclaringType); private static HtmlDocument _htmlDocument;

    public static string Parse(string input)
    {
        _htmlDocument = new HtmlDocument();

        _htmlDocument.LoadHtml(input);
        ParseNode(_htmlDocument.DocumentNode);

        return _htmlDocument.DocumentNode.WriteTo().Trim();
    }

    private static void ParseChildren(HtmlNode parentNode)
    {
        for (int i = parentNode.ChildNodes.Count - 1; i >= 0; i--)
        {
            ParseNode(parentNode.ChildNodes[i]);
        }
    }

    private static void ParseNode(HtmlNode node)
    {
        if (node.NodeType == HtmlNodeType.Element)
        {
            if (node.Name == "img" && node.HasAttributes)
            {
                for (int i = node.Attributes.Count - 1; i >= 0; i--)
                {
                    HtmlAttribute currentAttribute = node.Attributes[i];
                    if ("class" == currentAttribute.Name && currentAttribute.Value.ToLower().Contains("featured"))
                    {
                        try
                        {
                            string originaleImagePath = node.Attributes["src"].Value;

                            string imageThumbnailPath = GetImageThumbnail(originaleImagePath);

                            var anchorNode = HtmlNode.CreateNode("<a>");
                            var imageNode = HtmlNode.CreateNode("<img>");

                            imageNode.SetAttributeValue("alt", node.Attributes["alt"].Value);
                            imageNode.SetAttributeValue("src", imageThumbnailPath);

                            anchorNode.SetAttributeValue("href", originaleImagePath);

                            anchorNode.AppendChild(imageNode);
                            node.ParentNode.InsertBefore(anchorNode, node);

                            node.ParentNode.RemoveChild(node);
                        }
                        catch (Exception exception)
                        {
                            if (_log.IsDebugEnabled)
                            {
                                _log.WarnFormat("Some message: {0}", exception);
                            }
                        }
                    }
                }
            }
        }

        if (node.HasChildNodes)
        {
            ParseChildren(node);
        }
    }
}

}

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜