feedparser and Google News

2022-12-10 16:03 问答作者：

I'm trying to download a corpus of news (to try to do some natural language processing) from Google News using the universal feedparser with python. I really know nothing of XML, I'm just using an example of how to use the feedparser. The problem is that I can't find in the dict I get from the RSS feed the content of the news just the title.

The code I'm currently trying to use is this:

import feedparser
url = 'http://news.google.com.br/news?pz=1&cf=all&ned=us&hl=en&output=rss' 
# just some GNews feed - I'll use a specific search later

feed = feedparser.parse(url)
for post in feed.entries:
   print post.t开发者_如何学Pythonitle
   print post.keys()

The keys I get in this post are just the title, summary, date, etc... there's no content.

Is this some issue with Google News or am I doing anything wrong? Is there a way to do it?

Have you examined the feed from Google News?

There is a root element in each feed which contains a bunch of information and the actual entries dict. Here's a dirty way to see what's available:

import feedparser
d = feedparser.parse('http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&topic=w&output=rss')

print [field for field in d]

From what we can see we have an entries field which most likely contains .. news entries! If you:

import pprint
pprint.pprint(entry for entry in d['entries'])

We get some more information :) That will show you all the fields related to each entry in a pretty printed manner (that's what pprint is for)

So, to fetch all the titles of our news entries from this feed:

titles = [entry.title for entry in d['entries']

so, play around with that. Hopefully that's a helpful start

First you need to check out RSS Specification. And here is a feed parser. That should get you started.

继续阅读：feedparser google-news python rss

feedparser and Google News

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？