Strip all tbody tags without destroying their children
This Ruby code using Nokogiri
doc.xpath("//tbody").remove
removes the children of the <tbody>
(as well as the <tbody>
themselves). I only want to remove all <tbody>
tags from the document, leaving their child开发者_运维技巧ren in place. How can I achieve this?
require 'rubygems'
require 'nokogiri'
html = Nokogiri::HTML(DATA)
html.xpath('//table/tbody').each do |tbody|
tbody.children.each do |child|
child.parent = tbody.parent
end
tbody.remove
end
puts html.xpath('//table').to_s
__END__
<table border="0" cellspacing="5" cellpadding="5"><tbody>
<tr><td>Data</td></tr>
<tr><td>Data2</td></tr>
<tr><td>Data3</td></tr>
</tbody></table>
prints
<table border="0" cellspacing="5" cellpadding="5">
<tr><td>Data</td></tr>
<tr><td>Data2</td></tr>
<tr><td>Data3</td></tr>
</table>
You want to replace each tbody
with its children? Then that's all you need to say:
require 'nokogiri'
html = Nokogiri::HTML.fragment(DATA.read)
html.css('tbody').each{ |tbody| tbody.replace tbody.children }
puts html
__END__
<table><tbody>
<tr><td>Data</td></tr>
<tr><td>Data2</td></tr>
</tbody><tbody>
<tr><td>Data3</td></tr>
</tbody></table>
Producing:
<table>
<tr><td>Data</td></tr>
<tr><td>Data2</td></tr>
<tr><td>Data3</td></tr>
</table>
精彩评论