Search And Replace Special Characters PHP

2023-01-19 20:26 问答作者：

I am trying to search and replace special characters in strings that I am parsing from a csv file. When I open the text file with vim it shows me the character is <95> . I can't for the life o开发者_如何学JAVAf me figure out what character this is to use preg_replace with. Any help would be appreciated.

Thanks,

Chris Edwards

0x95 is probably supposed to represent the character U+2022 Bullet (•), encoded in Windows code page 1252. You can get rid of it in a byte string using:

$line= str_replace("\x95", '', $line);

or you can use iconv to convert the character set of the data from cp1252 to utf8 (or whatever other encoding you want), if you've got a CSV parser that can read non-ASCII characters reliably. Otherwise, you probably want to remove all non-ASCII characters, eg with:

$line= preg_replace("/[\x80-\xFF]/", '', $line);

If your CSV parser is fgetcsv() you've got problems. Theoretically you should be able to do this as a preprocessing step on a string before passing it to str_getcsv() (PHP 5.3) instead. Unfortunately this also means you have to read the file and split it row-by-row yourself, and this is not trivial to do given that quoted CSV values may contain newlines. By the time you've written the code to handle properly that you've pretty much written a CSV parser. So what you actually have to do is read the file into a string, do your pre-processing changes, write it back out to a temporary file, and have fgetcsv() read that.

The alternative would be to post-process each string returned by fgetcsv() individually. But that's also unpredictable, because PHP mangles the input by decoding it using the system default encoding instead of just giving you the damned bytes. And the default encoding outside of Windows is usually UTF-8, which won't read a 0x95 byte on its own as that'd be an invalid byte sequence. And whilst you could try to work around that using setlocale() to change the system default encoding, that is pretty bad practice which won't play nicely with any other apps you've got running that depend on system locale.

In summary, PHP's built-in CSV parsing stuff is pretty crap.

Following Bobince's suggestion, the following worked for me:

analyse_file() -> http://www.php.net/manual/en/function.fgetcsv.php#101238

function file_get_contents_utf8($fn) {
    $content = file_get_contents($fn);
    return mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}


if( !($_FILES['file']['error'] == 4) ) {
    foreach($_FILES as $file) {
        $n = $file['name'];
        $s = $file['size'];
        $filename = $file['tmp_name'];
        ini_set('auto_detect_line_endings',TRUE); // in case Mac csv
        // dealing with fgetcsv() special chars
        // read the file into a string, do your pre-processing changes
        // write it back out to a temporary file, and have fgetcsv() read that.
        $file = file_get_contents_utf8($filename);
        $tempFile = tempnam(sys_get_temp_dir(), '');
        $handle = fopen($tempFile, "w+");
        fwrite($handle,$file);
        fseek($handle, 0);
        $filename = $tempFile;      
        // END -- dealing with fgetcsv() special chars
        $Array = analyse_file($filename, 10);
        $csvDelim = $Array['delimiter']['value'];
        while (($data = fgetcsv($handle, 1000, $csvDelim)) !== FALSE) {
            // process the csv file
        }
    } // end foreach
}

继续阅读：php

Search And Replace Special Characters PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？