开发者

Is Byte order mark required here?

I am generating a csv file through php to be downloaded through the browser. Do I need to insert the by开发者_如何转开发te order mark bytes in the beginning, considering the target system that would be used can be a mac,unix , windows , etc ?


No, you are not required to.

Byte Order Mark is used in some Unicode encodings, namely UTF-8, UTF-16 and UTF-32 to determine that the encoding is really Unicode.

In UTF-16, it is used to differentiate UTF-16 from UCS-2 (a subset of UTF-16).

It is optional in UTF-8 and UTF-32, but valid. However, in UTF-8, it can cause compatibility issues. To quote a well-phrased Wikipedia entry:

If compatibility with existing programs is not important, the BOM could be used to identify if a file is in UTF-8 versus a legacy encoding, but this is still problematic, due to many instances where the BOM is added or removed without actually changing the encoding, or various encodings are concatenated together. Checking if the text is valid UTF-8 is more reliable than using BOM.

I would go against using the BOM in UTF-8 for those reasons.


Concerning the original question, it is really up to the way that file is encoded when written. If it will be utf-8 encoded i'd add the BOM. If there are just ASCII characters within the file, the BOM can be absent because there will be no sequences. If however utf-8 sequences are within the file, it will be more easy to detect the BOM as to walk through the whole file and check for valid sequences. And even if you detect a single sequence, it still might be single characters above 0x7F.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜