开发者

Html Agility Pack InnerHtml returns incorrect string with textboxes

The 开发者_如何学Cfollowing test code:

[Test]
public void PossibleHtmlAgilityPackBug()
{
    const string html = @"<input type=""text"" name=""shouldNotTrim"" />";
    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    Assert.That(doc.DocumentNode.InnerHtml, Is.EqualTo(html));
}

Results in:

Expected string length 42 but was 40. Strings differ at index 39.
Expected: "<input type="text" name="shouldNotTrim" />"
But was:  "<input type="text" name="shouldNotTrim">"
--------------------------------------------------^

Is this a bug? Or is there a config that I can change to output that extra "/" I need?

Thanks,

Chi


This is not a bug. INPUT is considered by the parser as an "empty" element (see this for example: HTMLAgilityPack don't preserves original empty tags on the empty elements subjects), and by default, such elements are rendered without the closing /.

The reasons are historically related to HTML 3.2. Back in those days, INPUT was not required to be closed, although it looks like like a bug today.

This will fix your problem:

public void PossibleHtmlAgilityPackBug()
{
    const string html = @"<input type=""text"" name=""shouldNotTrim"" />";
    var doc = new HtmlDocument();
    doc.OptionWriteEmptyNodes = true;
    doc.LoadHtml(html);

    Assert.That(doc.DocumentNode.InnerHtml, Is.EqualTo(html));
}

As a side note, the HTML agility pack will not always create an exact equivalent of the html text, but it will always try to rebuild something that will be rendered the same way. Browsers support an unclosed INPUT without a problem.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜