开发者

HTML scraping with hpricot using Ruby 1.8.7 vs 1.9.2

Relevant snippet from test.html:

<div id="seat_31F_vacant" class="seatVacant" onclick="UpdateHost(this);Common.DoPostBack('lbtPostBack','31F');" onmouseover="Seat_onMouseOver(this)" onmouseout="Seat_onMouseOut(this)">F</div>

Now consider this ruby code:

doc = Hpricot(test.html)
  
doc.search("开发者_高级运维//div[@class='seats']").each do |seat|          
    @vacant_seat += seat.to_s.scan(/id="seat_(.*)_vacant/)
end

@log.info @vacant_seat.to_s

When logging @vacant_seat.to_s this is what I end up with:

[["31F"], ["31E"], ["31D"], ["31C"]] (Using 1.9.2)

31F31E31D31C (Using 1.8.7)

that means if I do @vacant_seat[0].to_s I'll get:

["31F"] (1.9.2) and 31F (1.8.7)

I want to end up with 31F (as I do with 1.8.7)

Any thougts? Is there a generic way to do this that will work in both Ruby versions? I need to extract the string (eg. 31F) which is located between the underscore characters (_) in the ID attributes. If there is a better way to do this I would appreciate to hear about it.


Ruby 1.9.2 changed to_s for Arrays. It used to concatenate all of the elements and print them like 31F31E31D31C.

Now it adds fancy formatting to make it look like an array, so you see the brackets on the arrays, and quotes to the string elements inside of them, so you get [["31F"], ["31E"], ["31D"], ["31C"]].

It looks like @vacant_seat is an array of arrays, so that's why @vacant_seat[0].to_s gives you ["31F"].

If you just need the array that has the elements, then it's the same array in both, just being printed differently.

Now, you can use join to call what was to_s in 1.8.7. @vacant_seat.join #=> 31F31E31D31C or @vacant_seat[0].join #=> 31F, will give you what you're looking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜