HtmlAgilityPack close form tag automatically
I am tring to parse an html file with this code:
<div><form>...</div>...</form>
the problem is that the HtmlAgilityPack automatically close the form tag before the div ending tag:
<div><form>...</form></div>...</form>
so when I parse the form some of the form elements are missing. (I get only the elements befor the automatically added tag)
I already tr开发者_JAVA百科ied:
htmlDoc.OptionFixNestedTags = false;
htmlDoc.OptionAutoCloseOnEnd = false;
htmlDoc.OptionCheckSyntax = false;
HtmlNode.ElementsFlags.Remove("form");
HtmlNode.ElementsFlags.Add("form", HtmlElementFlag.CanOverlap);
HtmlNode.ElementsFlags.Add("div", HtmlElementFlag.CanOverlap);
But nothing helps!
thanks for you help!
The following seems to work for me:
HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");
_document = new HtmlDocument();
_document.OptionAutoCloseOnEnd = true;
_document.LoadHtml(content);
It depends on what you want to do programmatically after the text has been parsed. If you don't want to do anything special with it, the following code:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<div><form>form and div</div>form</form>");
doc.Save(Console.Out);
will display exactly the same string, that is:
<div><form>form and div</div>form</form>
Because the library was designed from the grounds up to try to keep the original Html as much as possible.
But in terms on how this is represented in the DOM, and in terms of errors, this is another story. You can't have at the same time 1) overlapping elements 2) XML-like DOM (which does not support overlaps) and 3) no errors.
So it depends on what you want to do after parsing.
精彩评论