Delphi dbExpress and Interbase: UTF8 migration steps and risks?
Currently, our database uses Win1252 as the only character encoding. We will have to support Unicode in the database tables soon, which means we have to perform this migration for four databases and around 80 Delphi applications which run in-house in a 24/7 environment. Are there recommendations for database migrations to UTF-8 (or UNICODE_FSS) for Delphi applications? Some questions listed below. Many thanks in advance for your answers!
- are there tools which help with the migration of the existing databases (sizes between 250 MB and 2 GB, no Blob fields), by dumping the data, recreating the database with UNICODE_FSS or UTF-8, and loading the data back?
- are there known problems with Delphi 2009, dbExpress and Interbase 7.5 related to Unicode character sets?
- would you recommend to upgrade the databases to Interbase 2009 first? (This upgrade is p开发者_如何学Pythonlanned but does not have a high priority)
- can we simply migrate the database and Delphi will handle the Unicode character sets automatically, or will we have to change all character field types in every Datamodule (dfm and source code) too?
- which strategy would you recommend to work on the migration in parallel with the normal development and maintenance of the existing application? The application runs in-house so development and database administration is done internally.
Update: From InterBase discussion forum thread: Unicode Databases in InterBase - Really? (it is not a thread by me but it shows that some problems still exist in InterBase XE).
Here are some reports I've submitted: QC#92867 - String fields are blank coming from Views only if the View includes a Union, and when using a ClientDataSet. This was found as missing data on a few of my reports, which no longer work.
QC#91494 - IB Character column data Character fields (eg: Char(1)) are padded with blanks when retrieved through a stored procedure. Tests fail - eg: If Active = "Y". I make heavy use of stored procedures with forms and these do not work.
QC#91355 - IBSqlMonitor fails. The output of IBSqlMonitor is somewhat garbled making this tool useless. (So, even my shovel is broken!)
Unreported - Persistent fields in TClientDataSet fail for TWideString.
Other related QC entries:
QC#94455 SQL Unicode Char Type Failure (InterBase XE)
Both Database Workbench and IBExpert can do the data migration for you.
I'll get back to you on the other questions when I'm at the Entwickler Tage.
--jeroen
Problem: UPDATE on a empty string field no longer finds a record. If a UTF8 character field is empty, the DataSetProvider generates a wrong SELECT for the update action.
Symptom: Message 'record not found or edited by another user'
Solution: upgrade to Delphi 2010 Update 4 or use the workaround described in QC
Problem: CHAR fields no longer work and have to be replaced with VARCHAR.
Symptom: SELECT queries on a column which now uses UTF8 and is imported from WIN1252 with ASCII values no longer returns any value. Maybe this is a bug which I should report in QC.
Solution: replace all occurences of CHAR(
in the database metadata DDL script with VARCHAR(
Problem: persistent string fields require a Size property which is the logical size of the field multiplied by four (see also: Is it possible to tweak TStringField to work like TWideStringField in Delphi?)
Symptom: Access violations
Solution: delete the persistent field and add it again to update the Size property. (side effect: the DisplayWidth will also increase size, leading to problems with UI)
Problem: UDF (user defined functions) with string parameters can break because of sizes limits.
Symptom:
Dynamic SQL Error.
SQL error code = -204.
Data type unknown.
Implementation limit exceeded.
COLUMN DSQL internal.
for this UDF:
DECLARE EXTERNAL FUNCTION STRLEN
CSTRING(32767)
RETURNS INTEGER BY VALUE
ENTRY_POINT 'IB_UDF_strlen' MODULE_NAME 'ib_udf';
Solution: fix UDF parameters in the declaration.
Problem: dbExpress uses WideString as data type internally, so all existing .AsString
calls for reading / setting field and parameter will no longer work
Symptom: special characters will not be stored / read correctly
Solution: replace all occurences of .AsString with .AsWideString but be careful to not change where the AsString method is not called on a field or parameter.
Problem: dbExpress needs TStringField objects for WIN1252 fields. For UTF8 database fields, dbExpress needs TWideStringField objects.
Symptom: error message 'expected: WideString found: string'
Solution: replace all occurences of TStringField with TWideStringField. This requires that all form files (dfm) are text, not binary. The modified forms and datamodules will not be backwards compatible.
Problem: exporting metadata and table data for a WIN1252 database will create a CP1252 encoded file, but for the import, a UTF8 file is required (tested with IBExpert)
Symptom: errors in the script import to InterBase
Solution: use iconv to convert the script file to UTF8
精彩评论