Extract some data from a lot of xml files
I have cricket player profiles saved in the form of <playerid>.xml
files in a folder. Each file has these tags in it:
<playerid>547</playerid>
<majorteam>England</majorteam>
<playername>Don</playername>
The playerid is same as in <playerid>.xml
(each file is of different size,1kb to 5kb). These are about 500 files. What I need is to extract the playername, majorteam, and playerid from all these files to a list. I will convert that list to XML later. If you know how can I do it directly to XML I will be very thankful.
If there is way to do it with C# or windows batch files or vbscript, I can use Java also. I just need get my data (id and name) a开发者_运维百科t one place.
Why don't you just do cat *.xml > all.xml
?
Use xsd.exe to generate a schema and class from your XML file.
Open a Visual Studio 2008 Command Prompt.
From the Visual Studio 2008 Command Prompt, run
c:\temp> xsd.exe player.xml
This generates an XML Schema based on your XML file.
Next, from the Visual Studio 2008 Command Prompt, run
c:\temp> xsd.exe player.xsd /classes /language:CS
This creates a new class based on your schema.
Now write code to deserialise the XML file using the class you generated; you can place this code in a loop for more than file.
FileStream fs = new FileStream("Player.XML", FileMode.Open);
// Create an XmlSerializer object to perform the deserialization
XmlSerializer xs = new XmlSerializer(typeof(Player));
Player p = xs.Deserialize(fs) as Player;
if ( s != null )
{
// process player here
}
If I had to do this task, I'd probably do it in Perl. The previous suggestion to concatenate (cat) all the files isn't really correct, since what you'll end up with will not be a valid XML file, but rather a bunch of valid XML files back to back.
Perl has a library called CPAN which contains all sorts of things for getting tasks done. If you install the XPath Library, it should be pretty easy to search for nodes you want and output them in a list.
If XPath is too burdensome, you might also want to look into regular expressions, colloquially known as regexes. Perl has amazing regex support.
If I had to use Java, I'd probably use its support for regular expressions. If I wanted to really get nitty-gritty with the XML nodes of the documents, I'd likely use Sun's Streaming API for XML (StAX).
Pick your scripting tongue of choice. Mine's Python.
In that language, this is about what you're looking for:
import xml.dom.minidom
import glob
from xml.parsers.expat import ExpatError
base_doc = xml.dom.minidom.parseString('<players/>')
doc_element = base_doc.documentElement
for filename in glob.glob("*.xml"):
f = open( filename )
x = f.read()
f.close()
try:
player = xml.dom.minidom.parseString(x)
except ExpatError:
print "ERROR READING FILE %s" % filename
continue
print "Read file %s" % filename
doc_element.childNodes.insert(-1, player.documentElement.cloneNode(True))
f = open( "all_my_players.xml", "w" )
f.write(doc_element.toxml())
f.close()
精彩评论