Encoding problems in python x64
i´开发者_如何学Cm trying to write a little script for writting a sqlite table from an archive list saved in a file. the code so far is this:
import os import _sqlite3 import sys
print sys.path[0] mydir = sys.path[0] print (mydir) def listdir(mydir):
lis=[]
for root, dirs, files in os.walk(mydir):
for name in files:
lis.append(os.path.join(root,name))
return lis
filename = "list.txt" print ("writting in %s" % filename) file = open(filename, 'w' ) for i in listdir(mydir):
file.write(i)
file.write("\n") file.close()
con =
_sqlite3.connect("%s/conection"%mydir) c=con.cursor()
c.execute(''' drop table files ''') c.execute('create table files (name text, other text)') file = open(filename,'r') for line in file :
a = 1
for t in [("%s"%line, "%i"%a)]:
c.execute('insert into files values(?,?)',t)
a=a+1 c.execute('select * from files') print c.fetchall() con.commit() c.close()
when i run i get the following:
Traceback (most recent call last): File "C:\Users\josh\FORGE.py", line 32, in <module>
c.execute('insert into files values(?,?)',t) ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
i´ve tried with the unicode() built in function but still won´t work, saying that he can´t decode the character 0xed or something.
I know the problem is on the encoding of the list strings, but i can´t find a way to put them right. any ideas? thanks in advance!
(zero). please reformat your code
after
for line in file:
do something likeline = line.decode('encoding-of-the-file')
, with encoding being something likeutf-8
, oriso-8859-1
-- you have to know your input encodingIf you don't know the encoding or not care about having a clean decoding, you can guess the most probable encoding and do a
line.decode('uft-8', 'ignore')
, omitting all characters not decodable. Also, you can use'replace'
, which replaces these chars with the 'Unicode Replacement Character' (\ufffd)use internally and during communication with the database only
unicode
objects, e.g.u'this is unicode'
(3). Don't use file
as variable name
also look here: Best Practices for Python UnicodeDecodeError
精彩评论