开发者

How do I generate a table of contents for HTML text in Python?

Assume that I have some HTML code, like this (generated from Markdown or Textile or something):

<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p&g开发者_如何学Ct;
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->

How could I generate a table of contents for it using Python?


Use an HTML parser such as lxml or BeautifulSoup to find all header elements.


Here's an example using lxml and xpath.

from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
    print node.tag, node.text
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜