开发者

Scrapy Newbie Question - can't get tutorial file working

I am a complete newbie to Python and Scrapy so I started by trying to replicate the tutorial. I am trying to scrape the www.dmoz.org website as per the tutorial.

I compose the dmoz_spider.py as indicated below

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dmoz.items import DmozItem

class DmozSpider(BaseSpider):
   name = "dmoz.org"
   allowed_domains = ["dmoz.org"]
   start_urls = [
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//ul/li')
       items = []
       for site in sites:
           item = DmozItem()
           item['title'] = site.select('a/text()').extract()
           item['link'] = site.select('a/@href').extract(开发者_JS百科)
           item['desc'] = site.select('text()').extract()
           items.append(item)
       return items

and what I am supposed to get via website is something different.

any idea what I am screwing up?


I had this problem. Make sure you made the below change as it says to do in the tutorial.

Open items.py and see if you changed class

class TutorialItem(Item):
    title=Field()
    link=Field()
    desc=Field()

into:

class DmozItem(Item):
    title=Field()
    link=Field()
    desc=Field()


There is nothing wrong with the code you pasted. The problem must be elsewhere, can you paste the whole output you get? (your comment stops where the interesting part starts...)


You need to go the the directory containing the settings.py file and run

scrapy crawl dmoz from there.

FOllow the structure of your project against https://github.com/scrapy/dirbot for clarity

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜