开发者

XML charactor encoding issues with accents

I have had the problem a few times now while working on projects and I would like to know if there's an elegant solution.

Problem I am pulling tweets via XML from twitter and uploading them to my DB however when I output them to screen I get these characters:

"moved to dusseldorf.�" OR también

and if I have Russian characters then I get lots of ugly boxes in place.

What I would like is the correct native a开发者_如何学编程ccents to show under one encoding. I thought was possible with UTF-8.

What I am using

  • PHP, MYSQL

After reading in the XML file I am doing the following to cleanse the data:

    $data = trim($data);
    $data = htmlentities($data);
    $data = mysql_real_escape_string($data);

My Database Collation is: utf8_general_ci

Web page character set is: charset=UTF-8

I think it could have something to do with HTML entities but I really appreciate a solution that works across the board on projects.

Thanks in advance.


Replace this line:

$data = htmlentities($data);

With this:

$data = htmlentities($data, null, "UTF-8");

That way, htmlentities() will leave valid UTF-8 characters alone. For more information see the documentation for htmlentities().


You need to change your connection's encoding to UTF-8 (it's usually iso-8859-1). See here: How can I store the '€' symbol in MySQL using PHP?

Calling htmlentities() is unnecessary when you get the encodings right. I would remove it completely. You'll just have to be careful to use htmlspecialchars() when outputting the data a in HTML context.


Make sure that you set your php internal encoding ot UTF8 using iconv_set_encoding, and that you call htmlentities with the encoding information as EdoDodo said. Also make sure that you're database stores with UTF8-encoding, though you say that's already the case.


You can't use htmlentities() in it's default state for XML data, because this function produces HTML entities, not XML entities.

The difference is that the HTML DTD defines a bunch of entity codes which web browsers are programmed to interpret. But most XML DTDs don't define them (if the XML even has a DTD).

The only entitity codes that are available by default to XML are >, < and &. All other entities need to be presented using their numeric entity.

PHP doesn't have an xmlentities() function, but if you read the manual page for htmlentities(), you'll see in the comments that that plenty of people have had this same issue and have posted their solutions. After a quick browse through it, I'd suggest looking at the one named philsXMLClean().

Hope that helps.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜