Can not remove french letters in a string returned by Python glob
I would like to rename files with french letters. I am using glob to browse the files and a function I've found on the Inter开发者_如何学Pythonnet to remove the french letters. The supprime_accent
seems to work ok. However, it doesn't rename files returned by the glob function.
Does anybody knows what can be the reason? Is it related with glob encoding?
def supprime_accent(ligne):
""" supprime les accents du texte source """
accents = { 'a': ['à', 'ã', 'á', 'â'],
'e': ['é', 'è', 'ê', 'ë'],
'i': ['î', 'ï'],
'u': ['ù', 'ü', 'û'],
'o': ['ô', 'ö'] }
for (char, accented_chars) in accents.iteritems():
for accented_char in accented_chars:
ligne = ligne.replace(accented_char, char)
return ligne
for file_name in glob.glob("attachments/*.jpg"):
print supprime_accent(file_name)
I see two potential problems here.
First, you need to use unicode strings in your source code, and you need to tell Python what encoding the source code is in. Unfortunately doing it right doubles the number of vowels in your table... :-\
# -*- coding: UTF-8 -*-
...
accents = { u'a': [u'à', u'ã', u'á', u'â'],
u'e': [u'é', u'è', u'ê', u'ë'],
u'i': [u'î', u'ï'],
u'u': [u'ù', u'ü', u'û'],
u'o': [u'ô', u'ö'] }
Second, I think you need to convert the filename returned by glob
to a unicode string.
import sys
file_name = file_name.decode(sys.getfilesystemencoding())
Python 3.0 fixed both these problems: filenames don't have to be decoded and unicode strings don't need a u
tag.
try this question and answers to it, in question I have given the final solution I am using latin-1 to ascii
and pass a unicode string to glob, to get unicode file names back e.g.
for file_name in glob.glob(u"attachments/*.jpg"):
print file_name.encode('ascii', 'latin2ascii')
I've succeed to fix the problem by converting file_name to unicode with cp1252 enncoding.
for file_name in glob.glob("attachments/*.jpg"):
file_name = file_name.decode(sys.getfilesystemencoding())
print unicodedata.normalize('NFKD', file_name).encode('ascii','ignore')
Edit: Jason gave a better solution by replacing unicode(file_name, 'cp1252') with file_name.decode(sys.getfilesystemencoding())
精彩评论