HtmlAgilityPack: how to create indented HTML?
So, I am generating html using HtmlAgilityPack and it's working perfectly, but html text is not indented. I can get indented XML however, but I need HTML. Is there a way?
HtmlDocument doc = new HtmlDocument();
// gen html
HtmlNode table = doc.CreateElement("t开发者_如何学Pythonable");
table.Attributes.Add("class", "tableClass");
HtmlNode tr = doc.CreateElement("tr");
table.ChildNodes.Append(tr);
HtmlNode td = doc.CreateElement("td");
td.InnerHtml = "—";
tr.ChildNodes.Append(td);
// write text, no indent :(
using(StreamWriter sw = new StreamWriter("table.html"))
{
table.WriteTo(sw);
}
// write xml, nicely indented but it's XML!
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlTextWriter.Create("table.xml", settings))
{
table.WriteTo(xw);
}
Fast, Reliable, Pure C#, .NET Core compatible AngleSharp
You can parse it with AngleSharp which provides a way to auto indent:
var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
document.ToHtml(writer, new PrettyMarkupFormatter
{
Indentation = "\t",
NewLine = "\n"
});
var indentedText = writer.ToString();
}
No, and it's a "by design" choice. There is a big difference between XML (or XHTML, which is XML, not HTML) where - most of the times - whitespaces are no specific meaning, and HTML.
This is not a so minor improvement, as changing whitespaces can change the way some browsers render a given HTML chunk, especially malformed HTML (that is in general well handled by the library). And the Html Agility Pack was designed to keep the way the HTML is rendered, not to minimize the way the markup is written.
I'm not saying it's not feasible or plain impossible. Obviously you can convert to XML and voilà (and you could write an extension method to make this easier) but the rendered output may be different, in the general case.
As far as I know, HtmlAgilityPack cannot do this. But you could look through html tidy packs which are proposed in similar questions:
- Html Agility Pack: make code look neat
- Which is the best HTML tidy pack? Is there any option in HTML agility pack to make HTML webpage tidy?
I made the same experience even though HtmlAgilityPack is great to read and modify Html (or in my case asp) files you cannot create readable output.
However, I ended up in writing some lines of code which work for me:
Having a HtmlDocument named "m_htmlDocument" I create my HTML file as follows:
file = new System.IO.StreamWriter(_sFullPath);
if (m_htmlDocument.DocumentNode != null)
foreach (var node in m_htmlDocument.DocumentNode.ChildNodes)
WriteNode(file, node, 0);
and
void WriteNode(System.IO.StreamWriter _file, HtmlNode _node, int _indentLevel)
{
// check parameter
if (_file == null) return;
if (_node == null) return;
// init
string INDENT = " ";
string NEW_LINE = System.Environment.NewLine;
// case: no children
if(_node.HasChildNodes == false)
{
for (int i = 0; i < _indentLevel; i++)
_file.Write(INDENT);
_file.Write(_node.OuterHtml);
_file.Write(NEW_LINE);
}
// case: node has childs
else
{
// indent
for (int i = 0; i < _indentLevel; i++)
_file.Write(INDENT);
// open tag
_file.Write(string.Format("<{0} ",_node.Name));
if(_node.HasAttributes)
foreach(var attr in _node.Attributes)
_file.Write(string.Format("{0}=\"{1}\" ", attr.Name, attr.Value));
_file.Write(string.Format(">{0}",NEW_LINE));
// childs
foreach(var chldNode in _node.ChildNodes)
WriteNode(_file, chldNode, _indentLevel + 1);
// close tag
for (int i = 0; i < _indentLevel; i++)
_file.Write(INDENT);
_file.Write(string.Format("</{0}>{1}", _node.Name,NEW_LINE));
}
}
精彩评论