开发者

How might I write a program to extract my data from Google Code?

I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).

Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?

Also, I'm aware of t开发者_开发知识库he CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).


I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:

>main.py -p synergy-plus -s 1 -c 1
parse: http://code.google.com/p/synergy-plus/issues/detail?id=1
wrote: synergy-plus_google-code-export.xml

... will create a file named synergy-plus_google-code-export.xml.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜