How to work around the invalid byte sequence in UTF-8 ArgumentError?
I am trying to run the following code where I use nokogiri to parse an xml file. I want to eliminate new line characters from text contained between tags. The code I have here, used to work, but for some reason, now it doesn't. Possibly because I upgraded to ruby-1.9.1.
titles 开发者_如何学JAVA= node.search('b')
titles.each do |e|
unless e.parent.name == "h4"
if e.children.children.first.nil? == false
puts e.children.children.first.text.gsub("\n","")
end
end
end
When I run the code I get this error:
HI. You're using libxml2 version 2.6.16 which is over 4 years old and has
plenty of bugs. We suggest that for maximum HTML/XML parsing pleasure, you
upgrade your version of libxml2 and re-install nokogiri. If you like using
libxml2 version 2.6.16, but don't like this warning, please define the constant
I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 before requring nokogiri.
test.rb:35:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)
You could try installing 1.9.2 via RVM.
curl -L https://get.rvm.io | bash
rvm install 1.9.2
If you want ruby to default to your rvm 1.9.2 install, then
rvm use 1.9.2 --default
NOTE: The above are equivalent to:
curl -L https://get.rvm.io | bash -s -- --ruby=1.9.2
精彩评论