开发者

mysql select in java on a column with accented utf8 string

I have encountered a problem when trying to select data from a table in MySQL in Java by a text column that is in utf-8. The interesting thing is that with code in Python it works well, in Java it doesn't.

The table looks as follows:

CREATE TABLE `x` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT, `text` varchar(255) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

The query looks like this:

SELECT * FROM x WHERE text = 'ěščřž'"

The Java code that doesn't work as exptected is the following:

public class test {
    public static void main(String [] args) {
        java.sql.Connection conn = null;
        System.out.println("SQL Test");
        try {
            Class.forName("com.mysql.jdbc.Driver").newInstance();
            conn = java.sql.DriverManager.getConnection(
                    "jdbc:mysql://127.0.0.1/x?user=root&password=root&characterSet=utf8&useUnicode=true&characterEncoding=utf-8&characterSetResults=utf8");
        } catch (Exception e) {
            System.out.println(e);
            System.exit(0);
        }

        System.out.println("Connection established");

        try {
            java.sql.Statement s = conn.createStatement();
            java.sql.ResultSet r = s.executeQuery("SELECT * FROM x WHERE text = 'ěščřž'");
            while(r.next()) {
                System.out.println (
                        r.getString("id") +开发者_运维技巧 " " +
                        r.getString("text")
                );
            }
        } catch (Exception e) {
            System.out.println(e);
            System.exit(0);
        }
    }
}

The Python code is:

# encoding: utf8

import MySQLdb

conn = MySQLdb.connect (host = "127.0.0.1",
                        port = 3307,
                        user = "root",
                        passwd = "root",
                        db = "x")
cursor = conn.cursor ()
cursor.execute ("SELECT * FROM x where text = 'ěščřž'")
row = cursor.fetchone ()
print row
cursor.close ()
conn.close ()

Both are stored on the filesystem in utf8 encoding (checked with hexedit). I have tried different versions of mysql-connector (currently using 5.1.15). Mysqld is 5.1.54.

Mysqld log for the Java code and Python code respectively:

110427 12:45:07     1 Connect   root@localhost on x
110427 12:45:08     1 Query     /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect'
                    1 Query     /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SELECT @@session.auto_increment_increment
                    1 Query     SHOW COLLATION
                    1 Query     SET autocommit=1
                    1 Query     SET sql_mode='STRICT_TRANS_TABLES'
                    1 Query     SELECT * FROM x WHERE text = 'ěščřž'
110427 12:45:22     2 Connect   root@localhost on x
                    2 Query     set autocommit=0
                    2 Query     SELECT * FROM x where text = 'ěščřž'
                    2 Quit      

Does anybody have any suggestions what might be the cause why the Python code works and why the Java code does not? (by not working I mean not finding the desired data -- the connection works fine)

Many thanks.


Okay, my bad. The database was wrongly built. It was built through the mysql client that by default is latin1 so in the database the data were encoded by utf8 twice.

The problem and the major difference between the two source codes is in that the Python code doesn't set the default charset (therefore it is latin1) whereas the Java code does (therefore it is utf8). So it was coincidence of many factors that made me think that something peculiar is actually going on.

Thanks for your responses anyway.


Use PreparedStatement and set your search string as a positional parameter into that statement.

Read this tutorial about PreparedStatements -> http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html

Also, never create a String literal in Java code that contains non-ASCII characters. If you want to pass non-ASCII characters do a unicode escaping on them. This should give you an idea what I am talking about -> http://en.wikibooks.org/wiki/Java_Programming/Syntax/Unicode_Escape_Sequences

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜