开发者

UnicodeWarning when comparing unicode strings to unicode results from os.walk command

Using python 2.7 I'm doing an os.walk with these files http://www.2shared.com/file/biSx7NI-/comer.html and then comparing the result against an array. In the actual program this array won't be predefined. The code that I am trying to use is as follows

# -*- coding: utf-8 -*-
import os.path
group = ['comer.txt', 'coma.txt', 'comamos.txt', 'coman.txt', 'comas.txt', 'come.txt', 'comed.txt', 'comemos.txt', 'comen.txt', 'comeremos.txt', 'comer\xc3\xa1.txt', 'comer\xc3\xa1n.txt', 'comer\xc3\xa1s.txt', 'comer\xc3\xa9.txt', 'comer\xc3\xa9is.txt', 'comer\xc3\xada.txt', 'comer\xc3\xadais.txt', 'comer\xc3\xadamos.txt', 'comer\xc3\xadan.txt', 'comer\xc3\xadas.txt', 'comes.txt', 'comido.txt', 'comiendo.txt', 'comiera.txt', 'comierais.txt', 'comieran.txt', 'comieras.txt', 'comiere.txt', 'comiereis.txt', 'comieren.txt', 'comieres.txt', 'comieron.txt', 'comimos.txt', 'comiste.txt', 'comisteis.txt', 'comi\xc3\xa9ramos.txt', 'comi\xc3\xa9remos.txt', 'comi\xc3\xb3.txt', 'como.txt', 'com\xc3\xa1is.txt', 'com\xc3\xa9is.txt', 'com\xc3\xad.txt', 'com\xc3\xada.txt', 'com\xc3\xadais.txt', 'com\xc3\xadamos.txt', 'com\xc3\xadan.txt', 'com\xc3\xadas.txt', 'comer\xc3\xa1.txt', 'comer\xc3\xa9.txt', 'comer\xc3\xada.txt', 'comer\xc3\xadais.txt']

print "********what we have*********"
i=0
for f in group:
    group[i] = os.path.basename(f)
    group[i] = unicode(group[i], "utf-8")        
    print group[i]
    i += 1

wantedResults = []
print "********what we want*********"
for(path, dirs, files) in os.walk("C:\corpus\zz-auto generated\spanish\comer"):
    wantedResults.append(files)
for f in wantedResults[0]:
    print f

print "********problems*********"
for resultWanted in wantedResults[0]:
    if resultWanted not in group:
        print "did not match our wanted results: " + resultWanted
for result in group:
    if result not in wantedResults[0]:
        print "extra file: " + result

I'm getting this error:

Warning (from warnings module): File "C:\Users***\Desktop\osWalkTest.py", line 26 if result not in wantedResults[0]: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - >interpreting them as being unequal

I could really use some help in getting the predefined array and the array from the os.walk to properly compare. I've looked t开发者_开发知识库his up on Google, and have tried many combinations of encoding and decoding the two arrays, but nothing seems to work. Thanks.


Have you tried (note the 'u' before the string, which turns it to Unicode):

for(path, dirs, files) in os.walk(u"C:/corpus/zz-auto generated/spanish/comer"):

(note that having back-slashes in a string is not a good idea, Unicode or not).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜