How can I copy data from a SQL_ASCII to a UTF8 postgresql db using SQLAlchemy?
I am writing a db data migration tool using SQLAlchemy's expression language as the basic tool.
My source database may be in UTF8, or it may be in SQL_ASCII. My target DB will always be in UTF8.
I'm using the psycopg2 driver in SQLAlchemy 0.6.6
My general migration process looks like this:
for t in target_tables:
log.info("Migrating data from %s", t.fullname)
source = self.source_md.tabl开发者_Go百科es[self.source_schema + "." + t.name]
for row in source.select().execute():
with sql_logging(logging.INFO):
conn.execute(t.insert(), row)
If I don't set anything encoding-related on the engines, I get this when I iterate over the select()
results:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
If I set use_native_unicode=True, encoding='utf-8'
on the engines, I get this when I attempt to insert the new row:
sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0xeb6d20
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
'INSERT INTO project_ghtests_survey000005.employees (first_name, employee_id) VALUES (%(first_name)s, %(employee_id)s)' {'first_name': 'Art\xebm', 'employee_id': '1234'}
Update details
To make inquires a bit quicker, here's the software stack in play:
- source_db encoding:
SQL_ASCII
- target_db encoding:
UTF8
- python 2.7
- sqlalchemy 0.6.6
- psycopg2 2.2.2
- PostgreSQL 8.2 server
It turns out that the solution was to set the connection client_encoding to 'latin1'
I accomplished this using a PoolListener like so:
class EncodingListener(PoolListener):
def connect(self, dbapi_con, con_record):
with closing(dbapi_con.cursor()) as cur:
cur.execute('show client_encoding')
encoding = cur.fetchone()[0]
if encoding.upper() == 'UTF8':
return
dbapi_con.set_client_encoding('latin1')
Since UTF-8 is backwards compatible with UTF-8, why use SQL_ASCII
isntead of UTF8
?
I think your encoding problems are probably more along the lines of latin1
or similar encodings. Not with ASCII
to UTF8
.
精彩评论