HTML parser that is compatible with JRuby?
I'm having a difficult time locating an HTML parser that works with JRuby.
I've become fond of开发者_如何学JAVA using Nokogiri for HTML parsing, but Nokogiri requires the use of bxml2.dll, which I don't have available on my machine and am not sure that I can ensure that it is available on all users' machines.
I attempted to use another favorite, Scrubyt, but that relies on Mechanize, which also requires Nokogiri.
What Ruby HTML parser do you recommend for use with JRuby?
THe pure java version of Nokogiri does not depend on libxml2 or any binary. See http://wiki.github.com/tenderlove/nokogiri/pure-java-nokogiri-for-jruby.
Hpricot is a popular HTML parsing library that has a pure java port as well. The functionality is similar, in fact Hpricot was the parser that popularized using CSS selectors for HTML parsing.
Why not use the pure-java version of nokogiri?
http://github.com/tenderlove/nokogiri/tree/java
精彩评论