开发者

rexml and nokogiri XML parsing

Can someone please explain why there is a difference in Nokogiri and REXML outputs in the code below.

require 'rubygems'
require 'Nokogiri'
require 'rexml/document'

xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>
<yml>
<a>TM and &#xA9; 2009<开发者_运维百科/a>
</yml>"

puts 'nokogiri'
doc = Nokogiri::XML(xml)
puts doc.to_s, "\n"

puts 'rexml'
doc = REXML::Document.new(xml)
puts doc.to_s

outputs:

nokogiri
<?xml version="1.0" encoding="ISO-8859-1"?>
<yml>
<a>TM and ? 2009</a>
</yml>

rexml
<?xml version='1.0' encoding='ISO-8859-1'?>
<yml>
<a>TM and &#xA9; 2009</a>
</yml>


Sure, nokogiri is converting the text using ISO-8859-1, whereas rexml is just outputting what you put in. If you change the XML to utf-8 encoding then you'll get:

nokogiri:
<?xml version="1.0" encoding="utf-8"?>
<yml>
<a>TM and © 2009</a>
</yml>

rexml:
<?xml version='1.0' encoding='UTF-8'?>
<yml>
<a>TM and &#xA9; 2009</a>
</yml>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜