How to get a flat XML so that external entities are merged to the top level
I know this is a borderline case whether it really belongs to stackoverflow or superuser, but as it seems there are quite a few 'editing code' questions over here, I am posting it on SO.
I have a pile of XML files that someone in their infinite wisdom have decided to e开发者_如何学Goxplode to a multiple files using the tags, which in result makes debugging/editing them a huge P-i-t-A. Therefore I am looking for:
- A way in VIM to open them in a single buffer (preferably so that the changes are saved in correct external entity files), OR;
- A way to expand the files in VIM so that the external entities are read and replaced in the buffer, OR;
- an easy bash/sed/python way of doing this on a command line (or in .vimrc)
The files included on top level might include new files and so on on who knows on how many levels so this needs to be recursive...
Here's a mockup sample on what the top level file looks like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foobar PUBLIC "foobar:dtd" "foobar.dtd" [
<!ENTITY foo SYSTEM "foo.xml">
<!ENTITY bar SYSTEM "bar.xml">
]>
<foo>
<params>
&foo;
</params>
<bar>
&bar;
</bar>
</foo>
EDIT: The list is in order of preference - if no 1. or 2. solutions are available, the bounty goes for the best #3...
EDIT 2: Looks like @Gaby 's answer works, but unfortunately only partially, unless I am doing something wrong - I'll write some sort of tool using his answer and post it here for improvements. Of course, a #1 or #2 solution would be appreciated... :)
EDIT 3: Ok, the best non-Emacs -answer will get the bounty ;)
Conclusion: Thanks to @hcayless I now have a working #2 solution, I added:
autocmd BufReadPost,FileReadPost *.xml silent %!xmllint --noent - 2> /dev/null
to my .vimrc
and everything is hunky dory.
If you have libxml2 installed, then xmllint will probably do this for you. Depending on your setup, you might need more params, but for your example,
xmllint --noent foobar.xml
will print your file to stdout with all entities resolved. Should be easy enough to wrap some bash scripting around it to do what you need.
For the #3 option you can take a look at pixdom and look at the documentation at pxdom 1.5 A Python DOM implementation
DOMConfiguration parameters
The result of the parse operation depends on the parameters set on the LSParser.domConfig mapping. By default, in accordance with the DOM specification, all CDATA sections will be replaced with plain text nodes and all bound entity references will be replaced by the contents of the entity referred to. This includes external entity references and the external subset.
it includes serializer to save the document to a file ..
Are you looking for something like this?
#!/opt/local/bin/python
import sys
if len(sys.argv) < 2:
print "some files needed."
sys.exit()
final = """
<?xml version="1.0" encoding="ISO-8859-1"?>
<nodes>
"""
for a in sys.argv[1:]:
ca = a.replace(".xml","")
final += "<" + ca + ">\n"
infile = open(a)
final += infile.read()
final += "</" + ca + ">\n"
final += "</nodes>\n"
outfile = open("final.xml", "w")
outfile.write(final)
outfile.close()
精彩评论