mysql select in java on a column with accented utf8 string
I have encountered a problem when trying to select data from a table in MySQL in Java by a text column that is in utf-8. The interesting thing is that with code in Python it works well, in Java it doesn't.
The table looks as follows:
CREATE TABLE `x` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT, `text` varchar(255) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`id`)) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
The query looks like this:
SELECT * FROM x WHERE text = 'ěščřž'"
The Java code that doesn't work as exptected is the following:
public class test {
public static void main(String [] args) {
java.sql.Connection conn = null;
System.out.println("SQL Test");
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
conn = java.sql.DriverManager.getConnection(
"jdbc:mysql://127.0.0.1/x?user=root&password=root&characterSet=utf8&useUnicode=true&characterEncoding=utf-8&characterSetResults=utf8");
} catch (Exception e) {
System.out.println(e);
System.exit(0);
}
System.out.println("Connection established");
try {
java.sql.Statement s = conn.createStatement();
java.sql.ResultSet r = s.executeQuery("SELECT * FROM x WHERE text = 'ěščřž'");
while(r.next()) {
System.out.println (
r.getString("id") +开发者_运维技巧 " " +
r.getString("text")
);
}
} catch (Exception e) {
System.out.println(e);
System.exit(0);
}
}
}
The Python code is:
# encoding: utf8
import MySQLdb
conn = MySQLdb.connect (host = "127.0.0.1",
port = 3307,
user = "root",
passwd = "root",
db = "x")
cursor = conn.cursor ()
cursor.execute ("SELECT * FROM x where text = 'ěščřž'")
row = cursor.fetchone ()
print row
cursor.close ()
conn.close ()
Both are stored on the filesystem in utf8 encoding (checked with hexedit). I have tried different versions of mysql-connector (currently using 5.1.15). Mysqld is 5.1.54.
Mysqld log for the Java code and Python code respectively:
110427 12:45:07 1 Connect root@localhost on x
110427 12:45:08 1 Query /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect'
1 Query /* mysql-connector-java-5.1.15 ( Revision: ${bzr.revision-id} ) */SELECT @@session.auto_increment_increment
1 Query SHOW COLLATION
1 Query SET autocommit=1
1 Query SET sql_mode='STRICT_TRANS_TABLES'
1 Query SELECT * FROM x WHERE text = 'ěščřž'
110427 12:45:22 2 Connect root@localhost on x
2 Query set autocommit=0
2 Query SELECT * FROM x where text = 'ěščřž'
2 Quit
Does anybody have any suggestions what might be the cause why the Python code works and why the Java code does not? (by not working I mean not finding the desired data -- the connection works fine)
Many thanks.
Okay, my bad. The database was wrongly built. It was built through the mysql client that by default is latin1 so in the database the data were encoded by utf8 twice.
The problem and the major difference between the two source codes is in that the Python code doesn't set the default charset (therefore it is latin1) whereas the Java code does (therefore it is utf8). So it was coincidence of many factors that made me think that something peculiar is actually going on.
Thanks for your responses anyway.
Use PreparedStatement and set your search string as a positional parameter into that statement.
Read this tutorial about PreparedStatements -> http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Also, never create a String literal in Java code that contains non-ASCII characters. If you want to pass non-ASCII characters do a unicode escaping on them. This should give you an idea what I am talking about -> http://en.wikibooks.org/wiki/Java_Programming/Syntax/Unicode_Escape_Sequences
精彩评论