Extract some data from a lot of xml files

2022-12-31 14:36 问答作者：

I have cricket player profiles saved in the form of <playerid>.xml files in a folder. Each file has these tags in it:

 <playerid>547</playerid>
 <majorteam>England</majorteam>
 <playername>Don</playername>

The playerid is same as in <playerid>.xml (each file is of different size,1kb to 5kb). These are about 500 files. What I need is to extract the playername, majorteam, and playerid from all these files to a list. I will convert that list to XML later. If you know how can I do it directly to XML I will be very thankful.

If there is way to do it with C# or windows batch files or vbscript, I can use Java also. I just need get my data (id and name) a开发者_运维百科t one place.

Why don't you just do cat *.xml > all.xml?

Use xsd.exe to generate a schema and class from your XML file.

Open a Visual Studio 2008 Command Prompt.
From the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xml

This generates an XML Schema based on your XML file.

Next, from the Visual Studio 2008 Command Prompt, run

c:\temp> xsd.exe player.xsd /classes /language:CS

This creates a new class based on your schema.

Now write code to deserialise the XML file using the class you generated; you can place this code in a loop for more than file.

FileStream fs = new FileStream("Player.XML", FileMode.Open);
// Create an XmlSerializer object to perform the deserialization
XmlSerializer xs = new XmlSerializer(typeof(Player));

Player p = xs.Deserialize(fs) as Player;
if ( s != null )
{
    // process player here          
}

If I had to do this task, I'd probably do it in Perl. The previous suggestion to concatenate (cat) all the files isn't really correct, since what you'll end up with will not be a valid XML file, but rather a bunch of valid XML files back to back.

Perl has a library called CPAN which contains all sorts of things for getting tasks done. If you install the XPath Library, it should be pretty easy to search for nodes you want and output them in a list.

If XPath is too burdensome, you might also want to look into regular expressions, colloquially known as regexes. Perl has amazing regex support.

If I had to use Java, I'd probably use its support for regular expressions. If I wanted to really get nitty-gritty with the XML nodes of the documents, I'd likely use Sun's Streaming API for XML (StAX).

Pick your scripting tongue of choice. Mine's Python.

In that language, this is about what you're looking for:

import xml.dom.minidom
import glob
from xml.parsers.expat import ExpatError

base_doc = xml.dom.minidom.parseString('<players/>')
doc_element = base_doc.documentElement

for filename in glob.glob("*.xml"):
    f = open( filename )
    x = f.read()
    f.close()
    try:
        player = xml.dom.minidom.parseString(x)
    except ExpatError:
        print "ERROR READING FILE %s" % filename
        continue
    print "Read file %s" % filename
    doc_element.childNodes.insert(-1, player.documentElement.cloneNode(True))

f = open( "all_my_players.xml", "w" )
f.write(doc_element.toxml())
f.close()

继续阅读：batch-file dos extract xml

Extract some data from a lot of xml files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？