Picasa albums title encoding. Not unicode?

2023-04-04 00:25 问答作者：

I wrote a simple client for Googles Picasa service. What I want is to create a folder with albums title name and download original photo from the service to this folder. If there is any non-latin characters in titl开发者_开发百科e I got an IOError:

IOError: [Errno 2] No such file or directory: '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'

Code sample:

import gdata.photos.service
import gdata.media
import os
import urllib2

gd_client = gdata.photos.service.PhotosService()

username = 'cha.com.ua'
albums = gd_client.GetUserFeed(user=username)
for album in albums.entry:
        photos = gd_client.GetFeed(
            '/data/feed/api/user/%s/albumid/%s?kind=photo' % (
                username, album.gphoto_id.text))

        for photo in photos.entry:
            destination = os.path.join(album.title.text, photo.title.text)
            out = open(destination, 'wb')
            out.write(urllib2.urlopen(photo.content.src).read())
            out.close()

I tried to decode the title with .decode('utf-8'), it does't work.

You say:

@rocksportrocker repr(album.title.text) returns str:
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

and

@d-k Yep, I've tried it. The result is the same.
For example repr(album.title.text.encode('utf-8')) returns str:
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

This cannot be true. If the first statement is correct, the second will cause:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

It appears that your str object is a UTF-8 encoded Cyrillic string:

>>> foo = '\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'
>>> from unicodedata import name
>>> for uc in foo.decode('utf8'):
...     print "U+%04X" % ord(uc), name(uc)
...
U+0412 CYRILLIC CAPITAL LETTER VE
U+0438 CYRILLIC SMALL LETTER I
U+0434 CYRILLIC SMALL LETTER DE
U+0020 SPACE
U+0438 CYRILLIC SMALL LETTER I
U+0437 CYRILLIC SMALL LETTER ZE
U+0020 SPACE
U+043E CYRILLIC SMALL LETTER O
U+043A CYRILLIC SMALL LETTER KA
U+043D CYRILLIC SMALL LETTER EN
U+0430 CYRILLIC SMALL LETTER A
>>>

Also the above is quite unlike the text in the error message: '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'

>>> bar =  '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'
>>> for uc in bar.decode('utf8'):
...     print "U+%04X" % ord(uc), name(uc)
...
U+041E CYRILLIC CAPITAL LETTER O
U+0441 CYRILLIC SMALL LETTER ES
U+0435 CYRILLIC SMALL LETTER IE
U+043D CYRILLIC SMALL LETTER EN
U+044C CYRILLIC SMALL LETTER SOFT SIGN
U+005C REVERSE SOLIDUS
U+0041 LATIN CAPITAL LETTER A
U+0075 LATIN SMALL LETTER U
U+0074 LATIN SMALL LETTER T
# snipped the remainder

The REVERSE SOLIDUS (backslash) indicates that you are running on Windows. Windows just doesn't grok UTF-8. Convert all your text to Unicode on input. Use Unicode for all paths and filenames. Simple example which works:

>>> bar =  '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c.txt'
>>> ubar = bar.decode('utf8')
>>> print repr(ubar)
u'\u041e\u0441\u0435\u043d\u044c.txt'
>>> f = open(ubar, 'wb')
>>> f.write('hello\n')
>>> f.close()
>>> open(ubar, 'rb').read()
'hello\n'

继续阅读：gdata picasa python

Picasa albums title encoding. Not unicode?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？