How to parse Wikipedia API Content Data
I'm finally successful in pulling data using the Wikipedia API, but there's something I really don't understand, and I can't seem to find the answer.
I'm using this to query data:
var title = "Fort_Capuzzo";
$.getJSON("http://en.wikipedia.org/w/api.php?action=query&开发者_如何学Pythonamp;prop=revisions&rvprop=timestamp|user|comment|content&titles=" + title + "&format=json&callback=?", function(data) {
console.log(data);
});
This returns an object which I can of course drill into to pull what I need. However, nowhere in the documentation does it state which parameters to use to pull specific data from within the content. To be more specific please view this wikipedia article: http://en.wikipedia.org/wiki/Battle_of_Madagascar
Say I wanted to pull the date, location, and perhaps result for that battle only from the right module on the page. How would I do this?
Thanks for any help!!
I used FireBug in Firefox to take a look at the object returned.
alert(data.query.pages[204126].revisions[0].user);
so the above alerts out i "Magus732"
From there you can take a look at the returned structure and come up with code to grab the details.
edit
alert(data.query.pages[204126].revisions[0]["*"]);
As far as parsing that goes, you may need to apply some clever css using jQuery or regular expressions to format it correctly and hide stuff you don't need.
gjunkie: I can understand You frustration, but from my experience with wikipedia edits, this pretty much all You can expect from wiki right now, as the pages are not database entries with fields and values but just documents with some formatting. I hope this will be changed as for example translating simplest data betwen languages as airplanes wingspan, is something that could've been done smarter, yet it isnt, You just type them in.
But returning to Your problem, I'd look at the 'edit' page and look for formatting patterns within the data You're interested in, like infoboxes etc. and starting from there.
精彩评论