Create a file in Java for loading into an nvarchar field in SQLServer 2005 using BCP and UTF-16
I want to use BCP to load into a SQL Server 2005 table with an nvarchar field using a loader control file. As I understand it, SQL Server 2005 only supports UTF-16 (and I believe it is UTF-16 LE). The file is being output by a Java program. The way I have it currently set up is as follows:
An XML format BCP loader file (created using the following command:
bcp test_table format nul -c -x -T -f test_table.xml -S server
)A Java program using the following code to write the output:
File f = new File("from_java.txt"); String encoding = "x-UTF-16LE-BOM"; OutputStream os = new FileOutputStream(f); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os, encoding); String theString = "áááááLittle Endian, BOM\r\n"; outputStreamWriter.append(theString); outputStreamWriter.flush(); outputStreamWriter.close();
Then using the following bcp command:
bcp test_table in from_jav开发者_JAVA技巧a.txt -T -f test_table.xml -S server -error error.txt
What I get in the table is ÿþá
. and not áááááLittle Endian, BOM
I've tried a few different permutations of changing parameters:
- changing the way I generate the loader control file (using -n for native data instead of -c for character data...I think this may have something to do with it, but I didn't see any improvement in my inserted data)
- tried several different forms of the UTF-16 encoding, including big endian and little endian with no BOM, to no avail
- tried to output the BOM manually in the file as I read somewhere that Microsoft really like to make use of BOM information
- looked into trying to output the file as UCS-2 (instead of UTF-16) as that is (apparently) what BCP is actually reading the file in as
- tried -w on the bcp import, this does work, but not in conjunction with a loader format file (is there a way to incorporate whatever magic tells BCP that the file is encoded in UTF-16 into the format file?)
- I can get it to work if I output the file in windows-1252 and specify that codepage as a
-c 1252
option to bcp when I load the file (but I don't want to do this as I will be losing information as UTF-16 is a superset of what can be represented compared to 1252)
Has anyone managed to get bcp to load into an nvarchar field using UTF-16 data in conjunction with a loader format configuration file?
Thanks in advance,
-James
I have been literally underwhelmed with responses, but I have cracked it.
The loader file needs to be generated with a -w
flag, so the command to generate the file is:
bcp <table> format nul -w -x T -f loader-control-w-format.xml -S <server> -t "||"
This leads to a loader control file that looks a little different, you get entries such as:
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="|\0|\0" MAX_LENGTH="1000" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
Note the delimiter is listed as |\0|\0
, the zeroes correspond to the extra byte in the file as UTF-16 (or just "unicode" as Microsoft (wrongly) call it) is a double byte character encoding.
Some notes for the sanity of anyone else dealing with BCP in this way:
- When SQLServer talks of "native" they mean native characters, i.e. accented characters
- When SQLServer talks of Unicode, what they actually mean is the UTF16 (Little Endian) manner of encoding the Unicode characterset. This is what the -w pertains to
- When writing a file for loading into BCP using UTF-16, the file has to be in UTF-16 Little Endian format, and cannot contain a UTF BOM (as BCP will interpret this a byte that should be loaded and your first record will contain the BOM, urgh!)
The Java code for writing out a file in UTF-16 that can be loaded in this way is as follows:
final File f = new File("C:\\temp\\bcp_prob\\from_java-UTF-16.txt");
//LE with no BOM is important here:
final String encoding = "UTF-16LE";
final OutputStream os = new FileOutputStream(f);
final OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os, encoding);
final String theString = "UTF-16-LE, intermetálico básicos intermetálico película magnética dinámicos||another_col\r\n";
outputStreamWriter.append(theString);
outputStreamWriter.flush();
outputStreamWriter.close();
精彩评论