开发者

How to generate an *exact* copy of an XML document with resolved entities

Given an XML document like this:

 <!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
 <author>john</author>
 <doc>
   <title>&title;</title>
 </doc>  

I wanted to parse the above XML document and generate a copy of it with all of its entities already resolved. So given the above XMl document, the parser should output:

 <!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
 <author>john</author>
 <doc>
   <title>Stack Overflow Madness</title>
 </doc>  

I know that you could implement an org.xml.sax.EntityResolver to resolve entities, but what I don't know is how to properly generate a copy of the XML document with everything still intact (except its entities). By everything, I mean the whitespaces, the dtd at the top of the document, the comments, and 开发者_JAVA技巧any other things except the entities that should have been resolved previously. If this is not possible, please suggest a way that at least can preserve most of the things (e.g. all but no comments).

Note also that I am restricted to the pure Java API provided by Sun, so no third party libraries can be used here.

Thanks very much!

EDIT: The above XML document is a much simplified version of its original document. The original one involves a very complex entity resolution using EntityResolver whose significance I have greatly reduced in this question. What I am really interested is how to produce an exact copy of the XML document with an XML parser that uses EntityResolver to resolve the entities.


You almost certainly cannot do this using any XML parser I've heard of, and certainly the Sun XML parsers cannot do it. They will happily discard details that have no significance as far as the meaning of the XML is concerned. For example,

<title>Stack Overflow Madness</title>

and

<title >Stack Overflow Madness</title >

are indistinguishable from the perspective of the XML syntax, and the Sun parsers (rightly) treat them as identical.

I think your choices are to do the replacement treating the XML as text (as @Wololo suggests) or relax your requirements.

By the way, you can probably use an XmlEntityResolver independently of the XML parser. Or create a class that does the same thing. This may mean that String.replace... is not the answer, but you should be able to implement an ad-hoc expander that iterates over the characters in a character buffer, expanding them into a second one.


Is it possible for you to read in the xml template as a string? And with the string do something like

string s = "<title>&title;</title>";
s = s.replace("&title;", "Stack Overflow Madness");
SaveXml(s);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜