开发者

Python lxml wrapping elements

I was wondering what the easiest way to wrap an element with another element using lxml and Python for example if I have a html snippet:

<h1>The cool title</h1>
<p>Something Neat</p>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
<p>The end of the snippet</p开发者_开发知识库>

And I want to wrap the table element with a section element like this:

<h1>The cool title</h1>
<p>Something Neat</p>
<section>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
</section>
<p>The end of the snippet</p>

Another thing I would like to do is scour the xml document for h1s with a certain attribute and then wrap all of the elements until the next h1 tag in an element for example:

<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>

Converted to:

<section>
<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
</section>
<section>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>
</section>

Thanks for all the help, Chris


lxml's awesome for parsing well formed xml, but's not so good if you've got non-xhtml html. If that's the case then go for BeautifulSoup as suggested by systemizer.

With lxml, this is a fairly easy way to insert a section around all tables in the document:

import lxml.etree

TEST="<html><h1>...</html>"

def insert_section(root):
    tables = root.findall(".//table")
    for table in tables:
        section = ET.Element("section")
        table.addprevious(section)
        section.insert(0, table)   # this moves the table

root = ET.fromstring(TEST)
insert_section(root)
print ET.tostring(root)

You could do something similar to wrap the headings, but you would need to iterate through all the elements you want to wrap and move them to the section. element.index(child) and list slices might help here.


If you're parsing certain xml files, you can use BeautifulSoup http://www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a great way to represent xml as a python object. You can then write python objects to analyze the html and add/remove tags. Therefore you can have a is_h1 function which will find all the tags in the xml file. Then you can add a section tag using beautiful soup.

If you would like to return this to the browser, you can use an HttpResponse with the argument being the string representation of the finished xml product.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜