MS SQL: How to convert two-bytes nvarchar to string?
I have nvarchar(4000) field containing data like this:
D0B6D181D0B5D0B4D0BA35D0BC (cyrillic string)
E59EA0E开发者_C百科78999E79B98E99499 (chinese string)
...
Each character is presented by two bytes in data sequence. How to convert this data to string using T-SQL?
Your first example is UTF8-encoded Cyrillic text that's been converted to a hexadecimal string and stored in an nvarchar(4000) field in SQL Server. That's quite an odd combination. Strangely, SQL Server has no native support for converting UTF8 to nvarchar in TSQL. You can roll your own DecodeUTF8 function, or you can use mine below.
Your examples:
select
Cyrillic = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'D0B6D181D0B5D0B4D0BA35D0BC', 1))
, Chinese = dbo.DecodeUTF8(convert(varbinary(max), '0x'+ 'E59EA0E78999E79B98E99499', 1))
Output:
Cyrillic Chinese
жседк5м 垠牙盘错
My UTF8 decoder for TSQL:
create function [dbo].[DecodeUTF8](@utf8 varchar(max)) returns nvarchar(max)
as
begin
declare @xml xml;
with e2(n) as (select top(16) 0 from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n))
, e3(n) as (select top(256) 0 from e2, e2 e)
, e4(n) as (select top(65536) 0 from e3, e3 e)
, e5(n) as (select top(power(2.,31)-1) row_number() over (order by(select 0)) from e4, e4 e)
, numbers(i) as (select top(datalength(@utf8)) row_number() over (order by(select 0)) from e5)
, x as (
select *
from numbers
cross apply (select byte = convert(tinyint, convert(binary(1), substring(@utf8, i, 1)))) c
cross apply (select n = floor(log(~(byte) * 2 + 1, 2)) - 1) d
cross apply (select bytes = case when n in (5,4,3) then 7 - n else 1 end) e
cross apply (select data = byte % power(2, n)) f
)
select @xml =
(
select nchar(case x.bytes
when 1 then x.data
when 2 then power(2, 6) * x.data + x2.data
when 3 then power(2, 6*2) * x.data + power(2, 6) * x2.data + x3.data
when 4 then power(2, 6*3) * x.data + power(2, 6*2) * x2.data + power(2, 6) * x3.data + x4.data
end)
from x
left join x x2 on x2.i = x.i + 1 and x.bytes > 1
left join x x3 on x3.i = x.i + 2 and x.bytes > 2
left join x x4 on x4.i = x.i + 3 and x.bytes > 3
where x.n <> 6
order by x.i
for xml path('')
);
return @xml.value('.', 'nvarchar(max)');
end
Take a look at this
http://devio.wordpress.com/2009/07/11/convert-unicode-hex-codepoint-to-unicode-character-in-sql-server/
It looks like they are doing single byte codepoints, so you may need to modify it a bit
精彩评论