Encoding issue when retrieving blob from SQL Server with Entity Framework
I've noticed a peculiarity when trying to store 2003 MS word documents within a SQL Server blob (nvarchar(max)
) field and then retrieve them using Entity Framework (EF1) to then convert them back into a file (which is then attached to a email).
Basic Code:
1) I insert the document into the database using the openrowset:
INSERT INTO [dbo].[Attachment]([id],[Blob])
SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB)
2) I then retrieve the file from the database using EF1 (simplified for brevity - bypassing repo etc):
var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);
Problem:
This works, but I noticed some discrepancies with the file after pushing the file through this process, some minor formatting issues etc.
After doing some deeper digging (using VBinDiff) - it looks like some of the Unicode chars are being converted to FDFF
00DC > FDFF
original:
00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00
converted:
00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00
other example are:
BED9 > FDFF
CFD9 > FDFF
which looks to be the last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf
Questions
1) Am I doing something silly, or is EF doing something funky when returning the string开发者_Go百科 object attachment.Blob
- before I then try and convert it back to the byte array?
2) Is there a better way to retrieve the exact bytes out of the blob field while still using entity framework? (or should I use a stored procedure, or SqlDataReader instead - which I really don't want to do, in order not to convolute my data access paths)
Mitch's answer helped point out the failing in my code. For some reason (force of habit I think) I'd set the Blob field to be nvarchar(max)
when, as pointed out by Mitch, SINGLE_BLOB
saves the file information as a varbinary(max)
which is actually what I wanted within the application anyway (See question point 2).
Solution:
- Convert the database field from
nvarchar(max)
tovarbinary(max)
- Update the Entity Framework model
- Change the blob field inside the EF model from
string
to binary`
And finally, change this
var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);
To this:
var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var stream = new MemoryStream(attachment.Blob);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);
Importing using SINGLE_BLOB
returns the contents of a file as a single-row, single-column rowset of type varbinary(max)
.
Rather than SINGLE_BLOB
, I would suggest using SINGLE_NCLOB
which reads a file as nvarchar(max)
.
Unicode files must be read with the SINGLE_NCLOB
option shown here:
SELECT BulkColumn
FROM OPENROWSET (BULK 'path_to_attachment\abc123.doc', SINGLE_NCLOB) AS BLOB
Ref.: Using OPENROWSET to read large files into SQL Server
Update (in response to comment): If the files are not unicode (as you tried) then when you retrieve them you should not use Unicode encoding to get the bytes:
var bytes = Encoding.ASCII.GetBytes(attachment.Blob);
精彩评论