Replacing Word special characters with their "normal" defaults
Since users copy paste text from Word which looks like this:
“What’s the matter?” P开发者_运维技巧ART 2– A Review”
It ends up being:
%93What%92s the matter?%94 PART 2%96 A Review%94
I need it to be:
"What's the matter?" PART 2- A Review"
I'm looking for a PHp library which converts such text and does this in a standardized way because there are a lot more characters than just the ones i've listed here - eg the (c) copyright symbol, etc...
You want iconv
. The iconv()
function has options to perform transliteration from special characters such as curly quotes in Latin1 (ISO 8859-1) to the appropriate character in whatever encoding you're using such as curly quotes in UTF-8 or straight quotes (') in ASCII.
If this is a web form, the browser is likely already converting from Latin1 to UTF-8. If you want to store it in ASCII, for example, you'd use this:
$ascii = iconv('UTF-8', 'ASCII//IGNORE//TRANSLIT', $utf8);
Try this
function msword_conversion($str)
{
$invalid = array('Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z',
'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A',
'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E',
'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y',
'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a',
'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i',
'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', "`" => "'", "´" => "'", "„" => ",", "`" => "'",
"´" => "'", "“" => "\"", "”" => "\"", "´" => "'", "’" => "'", "{" => "",
"~" => "", "–" => "-", "’" => "'");
$str = str_replace(array_keys($invalid), array_values($invalid), $str);
return $str;
}
I think what you are looking for is urldecode()
As mentioned previously, urldecode()
is the function you're looking for. Basically the content has been encoded for safe URL use. Be aware however that word uses 66 and 99 style quote characters, rather than the standard quote strings "
used in most HTML content - so it may also be worthwhile doing a str_replace()
on those values so that you don't need to worry about character encoding when the page is displayed to the user with that content.
精彩评论