Best practices about parsing multi language feed
I'm having a problem parsing data from different feeds, some of them in English, others in Italian and others in Spanish. I'm parsing using a PHP script and saving the parsed data into my MySQL database.
The problem is that when I parse items that contains "non common" characters like: "Strage di Viareggio Più" when I look into my database the phrase is stored in this way: "Strage di Viareggio Più".
My database can use that kind character because when I input that manualy it works fine, in the original feed (rss fil开发者_StackOverflow社区e) the phrase is also fine, I think is my PHP server who is changing the letter. How can I solve this? Thanks!
Make sure that the database uses UTF-8 (as you say it does) and that the PHP script has its internal encoding set to UTF-8, which you can achieve with iconv_set_encoding. If you're reading data from an HTTP request that should be all you need, as long as the request tags its own encoding correctly.
Looks like input data is in UTF-8, but charset/collation of DB table - ASCII. I would suggest to have UTF-8 everywhere.
What you need to implement, before saving to MySQL is:
http://php.net/manual/en/function.htmlentities.php
Check these different threads for more information
- Best practices in PHP and MySQL with international strings
- htmlentities() makes Chinese characters unusable
What I find incredible is that this question has received -2 in the past 24 hours without any comments.
From the question posted:
I'm parsing using a PHP script and saving the parsed data into my MySQL database.
and
I think is my PHP server who is changing the letter. How can I solve this? Thanks!
The answers posted so far are related to the encoding and settings of MySQL. The person asking the question has clearly stated that he can insert special characters manually and is having no problems:
My database can use that kind character because when I input that manualy it works fine
My answer was to help him convert the characters into an html entity which will circumvent the problem he is having with the RSS feed and answering the question posted.
精彩评论