reading a file with the right encoding
I have a txt file where, if I open with a standart text editor as notepad 开发者_运维问答or scite, I can read strings like these :
Artist1 – Title 1
Artist2 – Title 2
Than I open it with my PHP script and I read the lines :
$tracklistFile_name=time().rand(1, 1000).".".pathinfo($_FILES['tracklistFile']['name'], PATHINFO_EXTENSION);
if(((pathinfo($tracklistFile_name, PATHINFO_EXTENSION)=='txt')) && (move_uploaded_file($_FILES['tracklistFile']['tmp_name'], 'import/'.$tracklistFile_name))) {
$fileArray=file('import/'.$tracklistFile_name, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$fileArray=array_values(array_filter($fileArray, "trim"));
for($i=0; $i<sizeof($fileArray); $i++) {
echo $fileArray[$i]."<br />";
}
}
and...WOW... i get this result :
Artist1 � Title1
Artist2 � Title2
??? What are those symbol? I think the encoding fail.
The symbol are so wrong that I can't insert them on database, neither with mysql_real_escape_string()
. In fact I get this error when I try to insert them :
Incorrect string value: '\x96 Titl...' for column 'atl' at row 1
How can I resolve this problem? Suggestions?
EDIT
Tried to add utf8_encode() before insert/add these strings : now the Insert don't fail, but the result is :
Artist1 Title1
Artist2 Title2
So i've lost information. Why?
You should read Joel Spolsky's article on UTF-8 and encoding.
Your problem almost definitely stems from an encoding mismatch, your first job is to figure out where this mismatch is occurring, your problem could be in a bunch of different places.
1) your php code could be reading input using an incorrect encoding (if you are trying to read in iso-8859, but the source file is encoded some other way)
2) your php code could be writing output using an incorrect encoding
3) whatever you are using to read the output (your browser) could be set to a different encoding than the bytes you are writing.
once you figure out which of the 3 places is causing your problem, you can figure out how to fix it by understanding what your source encoding is, and how to read/write using that source encoding instead of another encoding (which your system has probably set as the default).
EDIT: not knowing php well, it looks like you could use mb_detect_encoding and possibly also mb-convert-encoding.
Try this: $str = str_replace('\\x', '&#', $str);
精彩评论