Replace text in Word Document via ASP.NET
How can I replace a string/word in a Word Document via ASP.NET? I just need to replace a couple words in the document, so I would like to stay AWAY from 3rd party plugins & interop. I would like to do this by opening the file and replacing the text.
The following attempts were made:
I created a StreamReader and Writer to read the file but I think that I am reading and writing in the wrong format. I think that Word Documents are stored in binary?? If word documents are binary, how would I read and write the file in binary?
开发者_高级运维 Dim template As String = Request.MapPath("documentName.doc")
If File.Exists(template) Then
Dim sr As New StreamReader(template)
Dim content As String = sr.ReadToEnd()
sr.Close()
Dim sw As New StreamWriter(template)
content = content.Replace("@ T O D A Y S D A T E", Date.Now.ToString("MM/dd/yyyy"))
sw.Write(content)
sw.Close()
Else
Word binary format is proprietary to Microsoft. The specification to read the binary format is complex and will take you ages to learn about the document structure and the internal bit and byte structure. I really dont think you will save yourself anytime going down this path, so consider the below:
- Use Open XML
- Automate Word
- Use third party library like Aspose
- Use RTF rather than Doc. You can then look for specific RTF tag with your text and replace it with another set of RTF text block. This is probably the simplest for what you want to do if RTF is an acceptable format.
Personal experience, automating Word isn't as bad as it sounds. It is really not suitable for server high volume environment, but for smaller load, it works well of course if you write your code well to manage the application object and handling exceptions.
EDITED: Corrected about my initial NDA comment mentioned. This was the case when I worked on this back in 2005/6 and didnt realize Microsoft had decided to publish that in the recent year.
Lots of choices:
- Some of them expensive (Apose)
- Some of them hard (binary formats)
- Some of them require Interop (VSTO) or newer formats (Open XML)
- Some of them not mentioned yet, like
- running Word on the server and just writing to that (not recommended by MSFT, but probably your only real choice for a) cheap, b) simple
- OfficeWriter.
If word documents are binary, how would I read and write the file in binary?
They are, and that's why you should use a third party library to program against them.
I would like to stay AWAY from 3rd party plugins & interop
This requirement makes the task extremely hard. If your documents are in the "old Word format" (.doc), I will almost say that you are out of luck. If you can use Word 2007 documents (.docx) instead, you should be able to solve the problem by unzipping the file (it's essentially a ZIP archive), do search/replace in contained XML files and zip the document up again.
See also: Generating a Word Document with C#
You could perform Word automation on the server to easily do it, but that route is fraught with danger. Automation is not designed to run server side and you will find it regularly hangs when Word pop's up a prompt or confirmation box waiting for input that nobody can see.
You have to make a trade off, use Word automation and accept it may hang pretty regularly (anything from daily to weekly), or buy a third party solution. I use Aspose and it has solved a lot of problems.
精彩评论