How can I form a Word document using stream of bytes
I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:
ID Name FileData
----------------------------------------
1 Word1 292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)
the FileData field carries the data.
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsof开发者_如何学Pythont.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();
The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document
, but I want to fill its content manually from byte stream.
After getting the in-memory Word document, I want to do some parsing of keywords.
Any ideas?
- Create an in memmory file system, there are drivers for that.
- Give word a path to an ftp server path (or something else) which you then use to push the data.
One important thing to note: storing files in a database is generally not good design.
You could look at how Sharepoint solves this. They have created a web interface for documents stored in their database.
Its not that hard to create or embed a webserver in your application that can serve pages to Word. You don't even have to use the standard ports.
There probably isn't any straight-forward way of doing this. I found a couple of solutions searching for it:
- Use the OpenOffice SDK to manipulate the document instead of Word Interop
- Write the data to the clipboard, and then from the Clipboard to Word
I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).
There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream. There's a "package", but that's not really applicable.
The stream method is covered here: https://learn.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream
But even it relies on there being a physical file in order to form the stream:
string strDoc = @"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);
The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:
string newDocument = @"C:\temp\test.docx";
WriteFile(byteArray, newDocument);
If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.
You'd use this WriteFile()
function:
/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
using (MemoryStream stream = new MemoryStream())
{
stream.Write(byteArray, 0, (int)byteArray.Length);
// Save the file with the new name
File.WriteAllBytes(newDocument, stream.ToArray());
}
}
From there, you can open it with OpenXML and edit the file. There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a documentPath
, or the stream method mentioned earlier that relies on there being a physical file. You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:
string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd(); // <-- converts byte[] stream to string
}
// Play with the XML
XmlDocument xml = new XmlDocument();
xml.LoadXml(docText); // the string contains the XML of the Word document
XmlNodeList nodes = xml.GetElementsByTagName("w:body");
XmlNode chiefBodyNode = nodes[0];
// add paragraphs with AppendChild...
// remove a node by getting a ChildNode and removing it, like this...
XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
chiefBodyNode.RemoveChild(firstParagraph);
// Or play with the string form
docText = docText.Replace("John","Joe");
// If you manipulated the XML, write it back to the string
//docText = xml.OuterXml; // comment out the line above if XML edits are all you want to do, and uncomment out this line
// Save the file - yes, back to the file system - required
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
// Read it back in as bytes
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part
I know it's not ideal, but I have searched and not found a way to edit the byte[]
directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes. Doing byte[] byteArray = Encoding.UTF8.GetBytes(docText);
prior to re-reading the file will corrupt them, as would any other Encoding
I tried (UTF7
,Default
,Unicode
, ASCII
), as I found when I tried to write them back out using my WriteFile()
function, above, in that last line. When not encoded and simply collected using File.ReadAllBytes()
, and then writing the bytes back out using WriteFile()
, it worked fine.
Update:
It might be possible to manipulate the bytes like this:
//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
// do your updates -- see string or XML edits, above
// Once done, you may need to save the changes....
//wordDoc.MainDocumentPart.Document.Save();
}
// But you will still need to save it to the file system here....
// You would update "documentPath" to a new name first...
string documentPath = @"C:\temp\newDoc.docx";
using (FileStream fileStream = new FileStream(documentPath,
System.IO.FileMode.CreateNew))
{
mem.WriteTo(fileStream);
}
}
// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving
Reference:
https://learn.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)
But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database. It will also fail if the document is in .doc
format instead of .docx
on that line where the document is being opened.
Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the WordprocessingDocument.Open()
block, but still inside the using (MemoryStream mem = new MemoryStream() { ... }
statement:
// Convert
byteArray = mem.ToArray();
This will have your Word document byte[]
.
精彩评论