开发者

XML Parser to read xml tags from word file C#

I have some word templates(dot/dotx) files that cont开发者_如何转开发ain xml tags along with plain text.

At run time, I need to replace the xml tags with their respective mail merge fields.

So, need to parse the document for these xml tags and replace them with merge fields. I was using Regex to find and replace these xml tags. But I was suggested to use XML parser to parse for XML tags (Regex for string enclosed in <*>, C#)

Now that I have presented my case better,

could you please guide if XML parser will be a right tool to achive above?

if yes, do I need to save the word document as xml file and then need to parse for xml tags?

Please guide.


You need to use the Word APIs. This is more complicated than you think.

Word 2003 files (.doc, dot) are stored in a proprietary, binary format. Reading this format by reading the specification is near impossible, and it's well worth it to invest in an SDK for this, or to connect directly to Word through COM to handle the processing.

Word 2007 files (.docx, .dotx) are indeed in XML, but a .docx file is actually a zipped heirarchy of folders and files creating the document in pieces. For this, the OpenXML SDK can handle .docx, and I assume can also handle their equivalent templates.

An alternative for the 2007 format is to create your template using Word, and learn the heirarchy of files and handle them appropriately. Change the .docx or .dotx extension to .zip, unzip, and find where your find-and-replace tags are located. You may be able to just replace the tags, rezip the heirarchy and rename the extension.


Why don't you use the Word APIs to do this? I can't imagine any way to do this safely without using the APIs that were designed for the purpose.


Yes, you can to use System.Xml.XmlDocument class to read your XML source. You'll also need to declare all namespaces required to deal with that XML content.


First of all, I think Regex should be just fine.

But if you really want to use an XML parser I love XmlDocument/XmlNode in .NET. The two functions SelectSingleNode and SelectNodes are infinitely useful. Unfortunately, I do not have a Word XML example in front of me, so let's assume this XML:

<Document>
  <MergeField name="phone"></MergeField>
  <MergeField name="email"></MergeField>
</Document>

You would then use code as follows:

XmlDocument wordDoc = new XmlDocument();
wordDoc.Load(fileName);

XmlNodeList mergeNodes = wordDoc.SelectNodes("//MergeField");

foreach(XmlNode mergeNode in mergeNodes)
{
   string fieldName = mergeNode.Attributes["name"].Value;
   // Do something here based on field name
   // e.g.:

   mergeNode.InnerText = GetFieldValue(fielName);
}

doc.Save(fileName);

The tricky part is that Word XML uses XML namespaces all over the place, so you need to use the XmlNamespaceManager class is .NET to tell the XML document which namespace is which, so it would be more like:

XmlDocument wordDoc = new XmlDocument();
wordDoc.Load(fileName);

XmlNamespaceManager nsm = new XmlNamespaceManager(doc.NameTable);
nsm.AddNamespace("o", "http://somenamepaceurl.com");
XmlNodeList mergeNodes = wordDoc.SelectNodes("//o:MergeField", nsm);

foreach(XmlNode mergeNode in mergeNodes)
{
   string fieldName = mergeNode.Attributes["name"].Value;
   // Do something here based on field name
   // e.g.:

   mergeNode.InnerText = GetFieldValue(fielName);
}

doc.Save(fileName);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜