开发者

DOMParser fails to parse certain nodes?

I'm creating a plugin for Google Chrome. I try to parse the following xml:

<?xml version="1.0" encoding="utf-8"?>
<anime>
  <entry>
    <id>9938</id>
    <title>Ikoku Meiro no Crois&Atilde;&copy;e</title>
    <english>Crois&Atilde;&copy;e in a Foreign Labyrinth ~ The Animation</english>
    <synonyms>Ikoku Meiro no Crois&Atilde;&copy;e The Animation; Ikoku Meiro No Croisee The Animation; La crois&Atilde;&copy;e dans un labyrinthe &Atilde;&copy;tranger Special</synonyms>
    <episodes>12</episodes>
    <score>7.72</score>
    <type>TV</type>
    <status>Currently Airing</status>
    <start_date>2011-07-04</start_date>
    <end_date>0000-00-00</end_date>
    <synopsis>The story takes place in the second half of the 19th century, as Japanese culture gains popularity in the West. A young Japanese girl, Yune, accompanies a French traveller, Oscar, on his journey back to France, and offers to help at the family&amp;#039;s ironwork shop in Paris. Oscar&amp;#039;s nephew and shop-owner Claude reluctantly accepts to take care of Yune, and we learn how those two, who have so little in common, get to understand each other and live together in the Paris of the 1800s.</synopsis>
    <image>http://cdn.myanimelist.net/images/anime/8/29031.jpg</image>
  </entry>
</anime>

Using this code:

var parser = new DOMParser();
var xmlText = response.value;
var doc = parser.parseFromString(xmlText, "text/xml");
var entries = doc.getElementsByTagName("entry");

for (var i = 0; i < entries.length; ++i) {
    var node = entries[i];

    var titles = node.getElementsByTagName("title");
    console.log("titles.length: " + titles.length);
    if (titles.length > 0) {
        console.log("title: " + titles[0].childNodes[0].nodeValue);
    }

    var scores = node.getElementsByTagName("score");
    开发者_C百科console.log("scores.length: " + scores.length);
    if (scores.length > 0) {
        console.log("score: " + scores[0].childNodes[0].nodeValue);
    }

    var ids = node.getElementsByTagName("id");
    console.log("ids.length: " + ids.length);
    if (ids.length > 0) {
        console.log("id: " + ids[0].childNodes[0].nodeValue);
    }
}

Looking at the output it seems that the title node was found but not its inner text. The score node wasn't found at all:

titles.length: 1
title: 
scores.length: 0
ids.length: 1
id: 9938

Does anyone know why this happens and/or how fix it?

Workaround

I'm currently using a workaround based on the solution from this answer:

function htmlDecode(input){
  var e = document.createElement('div');
  e.innerHTML = input;
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

function xmlDecode(input){
  var result = input;
  result = result.replace(/</g,  "&lt;");
  result = result.replace(/>/g,  "&gt;");
  result = result.replace(/\n/g, "&#10;");
  return htmlDecode(result);
}

// Usage:
var parser = new DOMParser();
var doc = parser.parseFromString(xmlDecode(xmlText), "text/xml");

I'm not sure if this is the best way to go, but at least it's getting me further.


I'm not sure whether this is the cause of your problem, but XML documents have only five named entities defined: &amp;, &lt;, &gt;, &quot; and &apos;. Replace other entities with characters they're meant to represent (your document is in UTF-8, it is completely safe to use © or other such characters) or with number entities (like &#169;).

Alternatively, you may define your own entities if it would be difficult to replace them in your document:

<!DOCTYPE anime [
    <!ENTITY copy "&#169;">
]>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜