开发者

How to get the proper values after a html table parse with ruby/nokogiri

I have searched and searched for 3 days straight now trying to get a data scraper to work and it seems like I have successfully parsed the HTML table that looks like this:

<tr class='ds'>
<td class='ds'>Length:</td>
<td class='ds'>1/8"</td>
</tr>
<tr class='ds'>
<td class='ds'>Width:</td>
<td class='ds'>3/4"</td>
</tr>
<tr class='ds'>
<td class='ds'>Color:</td>
<td class='ds'>Red</td>
</tr>

However, I can not seem to get it to print to csv correctly.

The Ruby code is as follows:

Specifications = {
:length => ['Length:','length','Length'],       
:width => ['width:','width','Width','Width:'],  
:Color => ['Color:','color'], 
.......
}.freeze

def specifications
  @specifications ||= xml.css('tr.ds').map{|row| row.css('td.ds').map{|cell| cell.children.to_s } }.map{|record| 
  specification = Specifications.detect{|key, value| value.include? record.first } 
  [specification.to_s.titleize, record.last]  }
end 

And the csv is printing into one column (what seems to be the full arrays):

[["", nil], ["[:finishtype, [\"finish\", \"finish type:\", \"finish type\", \"finish type\", \"finish type:\"]]", "Metal"], ["", "1/4\""], ["[:length, [\"length:\", \"length\", \"length\"]]", "18\""], ["[:width, [\"width:\", \"width\", \"width\", \"width:\"]]", "1/2\""], ["[:styletype, [\"style:\", \"style\",开发者_开发技巧 \"style:\", \"style\"]]"........

I believe the issue is that I have not specified which values to return but I wasn't successful anytime I tried to specify the output. Any help would be greatly appreciated!


Try changing

[specification.to_s.titleize, record.last]

to

[specification.last.first.titleize, record.last]

The detect yields e.g. [:length, ["Length:", "length", "Length"]] which will become "[:length, [\"Length:\", \"length\", \"Length\"]]" by to_s. With last.first you can extract just the part "Length:" of it.

In case you encounter attributes not matching to your Specification, you could just drop the values by changing to:

    xml.css('tr.ds').map{|row| row.css('td.ds').map{|cell| cell.children.to_s } }.map{|record|  
      specification = Specifications.detect{|key, value| value.include? record.first }
      [specification.last.first.titleize, record.last] if specification 
    }.compact
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜