开发者

List of pages from wikipedia

I build an application which gives you to select subjects you like, those subjects should开发者_高级运维 be in DB.

There is millions 'likes'! (pizza, PHP, manchester united, any movie.. i dont know), so I decided to insert those 'likes' to my DB with Wikipedia.

Well, there is a way to get all of those 'likes' (with the api I have a limit [500 per search I think)? or another solution?

Thank you very much.


Take a look at the WikiMedia technical documentation. There is a section that talks about query continuation.

Alternatively, you could download a Wikipedia dump, install your own copy of WikiMedia and query to your hearts content. The dumps are huge, but depending on how much stuff you want to extract, this may finish the task faster, and with less impact on the Wikipedia service.


It's a bit unclear what information you are actually trying to retrieve from Wikipedia. Page titles?

Wikimedia provides XML files containing all page titles for all their projects at download.wikimedia.org. (Sadly the dumps seem to be currently unavailable due to hardware problems). You could parse the XML file and store all the titles in your own database.


Dumps are available from wikipedia in various formats, with varying levels of detail.

Pick one that best suits your needs and parse it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜