How to convert an UTF string with scandinavian characters to ASCII?

2022-12-24 08:21 问答作者：

I would like to convert this string

foo_utf = u'nästy chäräctörs with å and co.' # unicode

into this

foo_ascii = 'nästy chäräctörs with å and co.' # ASCII

Any idea how to do this in Python (2.6)? I found unicodedata mo开发者_如何学Godule but I have no idea how to do the transformation.

I don't think you can. Those "nästy chäräctörs" can't be encoded as ASCII, so you'll have to pick a different encoding (UTF-8 or Latin-1 or Windows-1252 or something).

Try the encode method of string.

>>> u'nästy chäräctörs with å and co.'.encode('latin-1')
'n\xe4sty ch\xe4r\xe4ct\xf6rs with \xe5 and co.'

There are several options in the codecs module in python's stdlib, depending on how you want the extended characters handled:

>>> import codecs
>>> u = u'nästy chäräctörs with å and co.'
>>> encode = codecs.get_encoder('ascii')
>>> encode(u) 
'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
>>> encode(u, 'ignore')
('nsty chrctrs with  and co.', 31)
>>> encode(u, 'replace')
('n?sty ch?r?ct?rs with ? and co.', 31)
>>> encode(u, 'xmlcharrefreplace')
('n&#228;sty ch&#228;r&#228;ct&#246;rs with &#229; and co.', 31)
>>> encode(u, 'backslashreplace')
('n\\xe4sty ch\\xe4r\\xe4ct\\xf6rs with \\xe5 and co.', 31)

Hopefully one of those will meet your needs. There's more information available in the Python codecs module documentation.

This really is a Django question, and not a python one. if the string is in one of your .py files, make sure that you have the following line on top of your file: -*- coding: utf-8 -*-

furthermore, your string needs to be of type "unicode" (u'foobar')

And then make sure that your html page works in unicode:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

That should do the whole trick. No encoding/decoding etc. necessary, just make sure that everything is unicode, and you are on the safe side.

You can also use the unicodedata module (http://docs.python.org/library/unicodedata.html) provided in python to convert a lot of unicode values into an Ascii variant. IE fix the different "s and such. Follow that up by the encode() method and you can completely clean up a string.

The method you mainly what out of the unicodedata is normalize and pass it the NFKC flag.

继续阅读：ascii python utf

How to convert an UTF string with scandinavian characters to ASCII?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？