开发者

how do i parse a wiki page without taking a dump of it in python?

Is it possible to parse a wiki without taking its dump , as the dump itself is way too much data to handle . Thus lets say I have the url of a certain wiki and once i call it through urllib , how do I parse it and get a certain type of data using python开发者_JAVA技巧 .

here type means a certain data corresponding to a semantic match to the search that would have been done .


You need an HTML parser to get the useful data from the HTML.

You can use BeautifulSoup to help parse the HTML. I recommend that you read the documentation and have a look at the examples there.


I'd suggest an option such as Harvestman instead, since a semantic search is likely throw multiple pages, compared to a simpler solution such as BS

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜