Scrapy Newbie Question - can't get tutorial file working
I am a complete newbie to Python and Scrapy so I started by trying to replicate the tutorial. I am trying to scrape the www.dmoz.org website as per the tutorial.
I compose the dmoz_spider.py as indicated below
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from dmoz.items import DmozItem
class DmozSpider(BaseSpider):
name = "dmoz.org"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
items = []
for site in sites:
item = DmozItem()
item['title'] = site.select('a/text()').extract()
item['link'] = site.select('a/@href').extract(开发者_JS百科)
item['desc'] = site.select('text()').extract()
items.append(item)
return items
and what I am supposed to get via website is something different.
any idea what I am screwing up?I had this problem. Make sure you made the below change as it says to do in the tutorial.
Open items.py and see if you changed class
class TutorialItem(Item):
title=Field()
link=Field()
desc=Field()
into:
class DmozItem(Item):
title=Field()
link=Field()
desc=Field()
There is nothing wrong with the code you pasted. The problem must be elsewhere, can you paste the whole output you get? (your comment stops where the interesting part starts...)
You need to go the the directory containing the settings.py file and run
scrapy crawl dmoz from there.
FOllow the structure of your project against https://github.com/scrapy/dirbot for clarity
精彩评论