开发者

Encoding issue when retrieving blob from SQL Server with Entity Framework

I've noticed a peculiarity when trying to store 2003 MS word documents within a SQL Server blob (nvarchar(max)) field and then retrieve them using Entity Framework (EF1) to then convert them back into a file (which is then attached to a email).

Basic Code:

1) I insert the document into the database using the openrowset:

INSERT INTO [dbo].[Attachment]([id],[Blob])     
   SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB) 

2) I then retrieve the file from the database using EF1 (simplified for brevity - bypassing repo etc):

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

Problem:

This works, but I noticed some discrepancies with the file after pushing the file through this process, some minor formatting issues etc.

After doing some deeper digging (using VBinDiff) - it looks like some of the Unicode chars are being converted to FDFF

00DC > FDFF

original:

00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00

converted:

00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00

other example are:

BED9 > FDFF
CFD9 > FDFF

which looks to be the last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf

Questions

1) Am I doing something silly, or is EF doing something funky when returning the string开发者_Go百科 object attachment.Blob - before I then try and convert it back to the byte array?

2) Is there a better way to retrieve the exact bytes out of the blob field while still using entity framework? (or should I use a stored procedure, or SqlDataReader instead - which I really don't want to do, in order not to convolute my data access paths)


Mitch's answer helped point out the failing in my code. For some reason (force of habit I think) I'd set the Blob field to be nvarchar(max) when, as pointed out by Mitch, SINGLE_BLOB saves the file information as a varbinary(max) which is actually what I wanted within the application anyway (See question point 2).

Solution:

  1. Convert the database field from nvarchar(max) to varbinary(max)
  2. Update the Entity Framework model
  3. Change the blob field inside the EF model from string to binary`

And finally, change this

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

To this:

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var stream = new MemoryStream(attachment.Blob);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);


Importing using SINGLE_BLOB returns the contents of a file as a single-row, single-column rowset of type varbinary(max).

Rather than SINGLE_BLOB, I would suggest using SINGLE_NCLOB which reads a file as nvarchar(max).

Unicode files must be read with the SINGLE_NCLOB option shown here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'path_to_attachment\abc123.doc', SINGLE_NCLOB) AS BLOB

Ref.: Using OPENROWSET to read large files into SQL Server

Update (in response to comment): If the files are not unicode (as you tried) then when you retrieve them you should not use Unicode encoding to get the bytes:

var bytes = Encoding.ASCII.GetBytes(attachment.Blob);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜