HTML scraping with hpricot using Ruby 1.8.7 vs 1.9.2
Relevant snippet from test.html:
<div id="seat_31F_vacant" class="seatVacant" onclick="UpdateHost(this);Common.DoPostBack('lbtPostBack','31F');" onmouseover="Seat_onMouseOver(this)" onmouseout="Seat_onMouseOut(this)">F</div>
Now consider this ruby code:
doc = Hpricot(test.html)
doc.search("开发者_高级运维//div[@class='seats']").each do |seat|
@vacant_seat += seat.to_s.scan(/id="seat_(.*)_vacant/)
end
@log.info @vacant_seat.to_s
When logging @vacant_seat.to_s this is what I end up with:
[["31F"], ["31E"], ["31D"], ["31C"]] (Using 1.9.2)
31F31E31D31C (Using 1.8.7)
that means if I do @vacant_seat[0].to_s I'll get:
["31F"] (1.9.2) and 31F (1.8.7)
I want to end up with 31F (as I do with 1.8.7)
Any thougts? Is there a generic way to do this that will work in both Ruby versions? I need to extract the string (eg. 31F) which is located between the underscore characters (_) in the ID attributes. If there is a better way to do this I would appreciate to hear about it.
Ruby 1.9.2 changed to_s for Arrays. It used to concatenate all of the elements and print them like 31F31E31D31C
.
Now it adds fancy formatting to make it look like an array, so you see the brackets on the arrays, and quotes to the string elements inside of them, so you get [["31F"], ["31E"], ["31D"], ["31C"]]
.
It looks like @vacant_seat
is an array of arrays, so that's why @vacant_seat[0].to_s
gives you ["31F"]
.
If you just need the array that has the elements, then it's the same array in both, just being printed differently.
Now, you can use join
to call what was to_s in 1.8.7. @vacant_seat.join #=> 31F31E31D31C
or @vacant_seat[0].join #=> 31F
, will give you what you're looking for.
精彩评论