rexml and nokogiri XML parsing
Can someone please explain why there is a difference in Nokogiri and REXML outputs in the code below.
require 'rubygems'
require 'Nokogiri'
require 'rexml/document'
xml = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>
<yml>
<a>TM and © 2009<开发者_运维百科/a>
</yml>"
puts 'nokogiri'
doc = Nokogiri::XML(xml)
puts doc.to_s, "\n"
puts 'rexml'
doc = REXML::Document.new(xml)
puts doc.to_s
outputs:
nokogiri
<?xml version="1.0" encoding="ISO-8859-1"?>
<yml>
<a>TM and ? 2009</a>
</yml>
rexml
<?xml version='1.0' encoding='ISO-8859-1'?>
<yml>
<a>TM and © 2009</a>
</yml>
Sure, nokogiri is converting the text using ISO-8859-1, whereas rexml is just outputting what you put in. If you change the XML to utf-8 encoding then you'll get:
nokogiri:
<?xml version="1.0" encoding="utf-8"?>
<yml>
<a>TM and © 2009</a>
</yml>
rexml:
<?xml version='1.0' encoding='UTF-8'?>
<yml>
<a>TM and © 2009</a>
</yml>
精彩评论