Why is the first line longer?

2023-02-23 03:45 问答作者：

i'm using python to read a txt document with:

f = open(path,"r")
for line in f:
    line = line.decode('utf8').strip()
    length = len(line)
    firstLetter = line[:1]

it seems to work, but the first line's length is always longer by... 1

for example: the first line is "XXXX" where X denotes a chinese character then length will be 5, but not 4 and firstLetter will be nothing

but when i开发者_Python百科t goes to the second and after lines,it works properly

tks~

You have a UTF-8 BOM at the start of your file. Don't faff about inspecting the first character. Instead of the utf8 encoding, use the utf_8_sig encoding with either codecs.open() or your_byte_string.decode() ... this sucks up the BOM if it exists and you don't see it in your code.

>>> bom8 = u'\ufeff'.encode('utf8')
>>> bom8
'\xef\xbb\xbf'
>>> bom8.decode('utf8')
u'\ufeff'
>>> bom8.decode('utf_8_sig')
u'' # removes the BOM
>>> 'abcd'.decode('utf_8_sig')
u'abcd' # doesn't care if no BOM
>>>

You are probably getting the Byte Order Mark (BOM) as the first character on the first line.

Information about dealing with it is here

继续阅读：python text-files

Why is the first line longer?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？