How to correctly parse an XML document with arbitrary namespaces

2023-01-21 04:04 问答作者：

I am trying to parse somewhat standard XML documents that use a schema called MARCXML from various sources.

Here are the first few lines of an example XML file that needs to be handled...

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
  <marc:record>
    <marc:leader>00925njm  22002777a 4500</marc:leader>

and one without namespace prefixes...

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>01142cam  2200301 a 4500</leader>

Key point: in order to get the XPaths to resolve further along in the program I have to go through a regex routine to add the namespaces to the NameTable (which doesn't add them by default). This see开发者_如何学JAVAms unnecessary to me.

Regex xmlNamespace = new Regex("xmlns:(?<PREFIX>[^=]+)=\"(?<URI>[^\"]+)\"", RegexOptions.Compiled);

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlRecord);
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable);

MatchCollection namespaces = xmlNamespace.Matches(xmlRecord);
foreach (Match n in namespaces)
{
    nsMgr.AddNamespace(n.Groups["PREFIX"].ToString(), n.Groups["URI"].ToString());
}

The XPath call looks something like this...

XmlNode leaderNode = xmlDoc.SelectSingleNode(".//" + LeaderNode, nsMgr);

Where LeaderNode is a configurable value and would equal "marc:leader" in the first example and "leader" in the second example.

Is there a better, more efficient way to do this? Note: suggestions for solving this using LINQ are welcome, but I would mainly like to know how to solve this using XmlDocument.

EDIT: I took GrayWizardx's advice and now have the following code...

if (LeaderNode.Contains(":"))
{
    string prefix = LeaderNode.Substring(0, LeaderNode.IndexOf(':'));
    XmlNode root = xmlDoc.FirstChild;
    string nameSpace = root.GetNamespaceOfPrefix(prefix);
    nsMgr.AddNamespace(prefix, nameSpace);
}

Now there's no more dependency on Regex!

If you know there is going to be a given element in the document (for instance the root element) you could try using GetNamespaceOfPrefix.

继续阅读：namespaces xml

How to correctly parse an XML document with arbitrary namespaces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？