help to get rid of HTML special chars in database
I've migrated my site from interspire CMS to Joomla! CMS.
I've managed to migrate all the database of articles, but some of them have a weird issue - when I access the page from joomla, the title contains HTML entities like ’
.
As you can guess from the CMS's I use, I rely on PHP as my server side, and MySql for my database.
I tried to go over the titles of the articles in the database with htmlspecialchars_decode
AND html_entity_decode
in order to get rid of those, but it had no effect.
if I just grab an example from the DB and echo it, it will look OK: What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
if I go to the article page in joomla it will look like this:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
When I go to PhpMyAdmin to see directly what is in the database, this is the contents of the title:
What’s Your Pleasure, Lasagna Or Pizza Manchester St开发者_StackOverflow社区yle?
I even tried to remove the symbol with:
str_replace("’","",$title);
or replace it like this
str_replace('’',"'",$title);
but nothing. When I tried to encode it again instead of decoding it (just to see if i'm on the right DB) it worked and encoded it again...
Please, I would be glad to have any new ideas... Thanks, Yanipan
Try setting encoding to cp1252. This worked out for me:
$decoded = html_entity_decode($your_string, ENT_QUOTES, 'cp1252');
Probably your best bet is to do search and replace within the database itself vs trying to do it with php. Search and replace in mysql is done like this:
update TABLE_NAME set FIELD_NAME = replace(FIELD_NAME, ‘find this string’, ‘replace found string with this string’);
So yours should look something like:
update ARTICLES set TITLE = replace(TITLE, '’', '\'');
Give that a shot.
Need more info
- What is the character encoding on your database? That
&
or;
, may be something other than the typical ASCII. - It's possible that PHP/Joomla is double-encoding your string. Look at the browser's page source and find the text in the produced HTML. Instead of
What’s
, it might just be one of the following:What&rsquo&59;s
What&38;rsquo&59;s
What’s
精彩评论