开发者

XML Parsing Error: undefined entity - special characters

Why does XML display error on certain special characters and some are ok?

For instance, below will create error,

<?xml version="1.0" standalone="yes"?>
<Customers>
    <Customer>
        <Name>L&ouml;ic</Name>
    </Customer>
</Customers>

but this is ok,

<?xml version="1.0" standalone="yes"?>
<Customers>
    <Customer>
        <Name>&amp;</Name>
    </Customer>
</Customers>

I convert the special character through php - htmlentities('Löic',ENT_QUOTES) by the way.

How can I get around this?

Thanks.

EDIT:

I found that it works开发者_如何学Python fine if I use numeric character such as L&#243;ic

now I have to find how to use php to convert special characters into numeric characters!


There are five entities defined in the XML specification — &amp;, &lt;, &gt;, &apos; and &quot;

There are lots of entities defined in the HTML DTD.

You can't use the ones from HTML in generic XML.

You could use numeric references, but you would probably be better off just getting your character encodings straight (which basically boils down to:

  • Set your editor to save the data in UTF-8
  • If you process the data with a programming language, make sure it is UTF-8 aware
  • If you store the data in a database, make sure it is configured for UTF-8
  • When you serve up your document, make sure the HTTP headers specify that it is UTF-8 (in the case of XML, UTF-8 is the default, so not specifying anything is almost as good)

)


Because it is not an built-in entity, it is instead an external entity that needs declaration in DTD.


TLDR Solution

You can solve this problem with html_entity_decode() (Source: PHP.net), like so...

$xml_line = '<description>' . html_entity_decode($description) . '</description>';

Full, Working Demo Online

In this demo, I use &rsquo; and a line from the Tao teh Ching to demonstrate the above use of html_entity_decode()...

$title = 'The name you can say isn&rsquo;t the real name.';
$xml_title = html_entity_decode($title)
$xml_title = str_replace(['<', '>',], ['&lt;', '&gt;',], $xml_title );
$xml_line = '<title>' . $xml_title . '</title>';
print($xml_line);

Don't forget to replace back those < and > chars, though!

Working Demo Sandbox

How Do You Know It Worked?

Want to verify it worked just fine? Then head on over to the W3C RSS Feed Validator, and see the above code being approved as just fine.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜