开发者

Python - Encoding string - Swedish Letters

I'm having some trouble with Python's raw_input command (Python2.6), For some reason, the raw_input does not get the converted string that swedify() produces and this giving me a encoding error which i'm aware of, that's why i made swedify() to begin with. Here's what i'm trying to do:

elif cmd in ('help', 'hjälp', 'info'):
    buffert += 'Just nu är programmet relativt begränsat,\nDe funktioner du har att använda är:\n'
    buffert += ' * historik :: skriver ut all din historik\n'
    buffert += ' * ändra <något> :: ändrar något i databasen, följande finns att ändra:\n'
    print swedify(buffert)

This works just fine, it outputs the swedish characters just as i want them to the console. But when i try to (in the same code, with same \x?? values, print this piece:

core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
core['goalTime'] = raw_input(swedify('Vad är ditt mål i minuter att springa ' +  core['goalDistance'] + 'km på: '))

Then i get this:

C:\Users\Anon>python löp.py
Traceback (most recent call last):
  File "l÷p.py", line 92, in <module>
    core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Now i've googled around, found some "solutions" but none of them work, some sad that i have to create a batch script that executes chcp ??? in the beginning, but that's not a clean solution IMO.

Here is swedify:

def swedify(inp):
    try:
        return inp.decode('utf-8')
    except:
        return '(!Dec开发者_StackOverflow中文版:) ' + str(inp)

Any solutions on how to get raw_input to read my return value from swedify()? i've tried from encodings import getencoder, getdecoder and others but nothing for the better.


You mention the fact that you received an encoding error which motivated you to write swedify in the first place, and you have found solutions around chcp which is a Windows command.

On *nix systems with UTF-8 terminals, swedify is not necessary:

>>> raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 100
'100'
>>> a = raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 200
>>> a
'200'

FWIW, when I do use swedify, I get the same error you do:

>>> def swedify(inp):
...     try:
...         return inp.decode('utf-8')
...     except:
...         return '(!Dec:) ' + str(inp)
... 
>>> swedify('Hur långt i kilometer är ditt mål: ') 
u'Hur l\xe5ngt i kilometer \xe4r ditt m\xe5l: '
>>> raw_input(swedify('Hur långt i kilometer är ditt mål: '))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Your swedify function returns a unicode object. The built-in raw_input is just not happy with unicode objects.

>>> raw_input("å")
åeee
'eee'
>>> raw_input(u"å")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128)

You might want to try this in Python 3. See this Python bug.

Also of interest: How to read Unicode input and compare Unicode strings in Python?.

UPDATE According to this blog post there is a way to set the system's default encoding. This might be worth a try.


For me it worked fine with:

#-*- coding: utf-8 -*-
import sys
import codecs
koden=sys.stdin.encoding

a=raw_input( u'Frågan är öppen? '.encode(koden))
print a

Per


On Windows, the console's native Unicode support is broken. Even the apparent UTF-8 codepage isn't a proper fix.

To read and write with Windows console you need use https://github.com/Drekin/win-unicode-console, which works directly with the underlying console API, so that multi-byte characters are read and written correctly.


Windows command prompt uses Codepage 850 when using Swedish regional settings (https://en.wikipedia.org/wiki/Code_page_850). It's probably used because of backwards compatibility with old MS-Dos programs.

You can set Windows command prompt to use UTF-8 as encoding by entering: chcp 65001 (Unicode characters in Windows command line - how?)


Try this magic comment at the very top of your script:

# -*- coding: utf-8 -*-

Here is some information about it: http://www.python.org/dev/peps/pep-0263/


Solution to a lot of problems:


Edit: C:\Python??\Lib\Site.py Replace "del sys.setdefaultencoding" with "pass"

Then,
Put this in the top of your code:

sys.setdefaultencoding('latin-1')

The holy grail of fixing the Swedish/non-UTF8 compatible characters.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜