How to convert an UTF string with scandinavian characters to ASCII?
I would like to convert this string
foo_utf = u'nästy chäräctörs with å and co.' # unicode
into this
foo_ascii = 'nästy chäräctörs with å and co.' # ASCII
.
Any idea how to do this in Python (2.6)? I found unicodedata mo开发者_如何学Godule but I have no idea how to do the transformation.
I don't think you can. Those "nästy chäräctörs" can't be encoded as ASCII, so you'll have to pick a different encoding (UTF-8 or Latin-1 or Windows-1252 or something).
Try the encode
method of string.
>>> u'nästy chäräctörs with å and co.'.encode('latin-1')
'n\xe4sty ch\xe4r\xe4ct\xf6rs with \xe5 and co.'
There are several options in the codecs
module in python's stdlib, depending on how you want the extended characters handled:
>>> import codecs
>>> u = u'nästy chäräctörs with å and co.'
>>> encode = codecs.get_encoder('ascii')
>>> encode(u)
'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
>>> encode(u, 'ignore')
('nsty chrctrs with and co.', 31)
>>> encode(u, 'replace')
('n?sty ch?r?ct?rs with ? and co.', 31)
>>> encode(u, 'xmlcharrefreplace')
('nästy chäräctörs with å and co.', 31)
>>> encode(u, 'backslashreplace')
('n\\xe4sty ch\\xe4r\\xe4ct\\xf6rs with \\xe5 and co.', 31)
Hopefully one of those will meet your needs. There's more information available in the Python codecs module documentation.
This really is a Django question, and not a python one.
if the string is in one of your .py files, make sure that you have the following line on top of your file:
-*- coding: utf-8 -*-
furthermore, your string needs to be of type "unicode" (u'foobar')
And then make sure that your html page works in unicode:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
That should do the whole trick. No encoding/decoding etc. necessary, just make sure that everything is unicode, and you are on the safe side.
You can also use the unicodedata module (http://docs.python.org/library/unicodedata.html) provided in python to convert a lot of unicode values into an Ascii variant. IE fix the different "s and such. Follow that up by the encode() method and you can completely clean up a string.
The method you mainly what out of the unicodedata is normalize and pass it the NFKC flag.
精彩评论