how do i parse a wiki page without taking a dump of it in python?
Is it possible to parse a wiki without taking its dump , as the dump itself is way too much data to handle . Thus lets say I have the url of a certain wiki and once i call it through urllib , how do I parse it and get a certain type of data using python开发者_JAVA技巧 .
here type means a certain data corresponding to a semantic match to the search that would have been done .
You need an HTML parser to get the useful data from the HTML.
You can use BeautifulSoup to help parse the HTML. I recommend that you read the documentation and have a look at the examples there.
I'd suggest an option such as Harvestman instead, since a semantic search is likely throw multiple pages, compared to a simpler solution such as BS
精彩评论