How do I generate a table of contents for HTML text in Python?
Assume that I have some HTML code, like this (generated from Markdown or Textile or something):
<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p&g开发者_如何学Ct;
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->
How could I generate a table of contents for it using Python?
Use an HTML parser such as lxml or BeautifulSoup to find all header elements.
Here's an example using lxml and xpath.
from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
print node.tag, node.text
精彩评论