开发者

SQL Server BULK INSERT - Escaping reserved characters

There's very little documentation available about escaping characters in SQL Server BULK INSERT files.

The documentation for BULK INSERT says the statement only has two formatting options: FIELDTERMINATOR and ROWTERMINATOR, however it doesn't say how you're meant to escape those characters if they appear in a row's field value.

For example, if I have this table:

CREATE TABLE People ( name varchar(MAX), notes varchar(MAX) )

and this single row of data:

"Foo, \Bar", "he has a\r\nvery strange name\r\nlol"

...how would its corresponding bulk insert file look like, because this wouldn't work for obvious reasons:

Foo,\Bar,he has a
very strange name
lol

SQL Server says it supports \r and \n but doesn't say if backslashes escape themselves, nor does it mention field value delimiting (e.g. with double-quotes, or escaping double-quotes) so开发者_Python百科 I'm a little perplexed in this area.


I worked-around this issue by using \0 as a row separator and \t as a field separator, as neither character appeared as a field value and are both supported as separators by BULK INSERT.

I am surprised MSSQL doesn't offer more flexibility when it comes to import/export. It wouldn't take too much effort to build a first-class CSV/TSV parser.


For the next person to search:

I used "\0\t" as a field separator, and "\0\n" for the end-of-line separator on the last field. Use of "\0\r\n" would also be acceptable if you wish to pretend that the files have DOS EOL conventions.

For those unfamiliar with the \x notation, \0 is CHAR(0), \t is CHAR(9), \n is CHAR(10) and \r is CHAR(13). Replace the CHAR() function with whatever your language offers to convert a number to a nominated character.

With this combination, all instances of \t and \n (and \r) become acceptable characters in the data file. After all, the weakness of the bulk upload system is that tabs and newlines are often legitimate characters in text strings, whereas other low-ASCII characters like CHAR(0), CHAR(1) and CHAR(2) are not legal text - not even appearing in UTF-8.

The only character you cannot have in your data is \0 - UNLESS you can guarantee it will never be followed by \t or \n (or \r)

If your language suffers problems when you use \0 in strings (but depending on how you code, you may still be able to avoid that problem) - AND if you know that your data won't have CHAR(1) or CHAR(2) in it (ie no binary) then use those characters instead. Those low characters are only going to be found when you are trying to store arbitrary binary data in strings.

Note also that you will find bytes 0, 1, 2 in UTF-16, UCS-2 and UTF-32 (aka UCS-4) - BUT - the 2 or 4 byte wide representation of CHAR(0, 1 or 2) is still acceptable and distinct from any legal unicode text. Just make sure you select the correct codepage setting in the format file to suit your choice of a UTF or UCS variant.


A bulk insert needs to have corresponding fields and field count for each row. Your example is a little rough, as its not structured data. As for thecharacters it will interpret them literally, not using escape characters (your string will be as seen in the file.

As for the double quotes enclosing each field, you will just have to use them as field and row terminators as well. So now your you should have:

Fieldterminator = '","', Rowterminator = '"\n'

Does that make sense? Then after the bulk insert you'll need to take out the prefix double quote with something like:

Update yourtable set yourfirstcolumn = right(yourfirstcolumn, len(yourfirstcolumn) - 1)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜