开发者

How to scrape the same url in loop with Scrapy

Needed content is located on the same page with a static URL.

I created a spider that scrapes this page and stores the items in CSV. But it does so only once and then finish the crawling process. But I need repeat the operation continuously. How can I do this?

Scrapy 0.12

Python 2开发者_如何转开发.5


Well giving you a specific example is kind of tough because I don't know what spider you're using and the internal workings of it, but something like this could work.

from scrapy.http import Request

class YourSpider(BaseSpider):
    # ...spider init details...
    def parse(self, response):
        # ...process item...
        yield item           
        yield Request(response.url, callback=self.parse)


You are missing dont_filter=True. Following is example.

import scrapy

class MySpider(BaseSpider):
    start_urls = ('http://www.test.com',)    

    def parse(self, response):
        ### Do you processing here
        yield scrapy.Request(response.url, callback=self.parse, dont_filter=True)


I code this way:

def start_requests(self):
    while True:
        yield scrapy.Request(url, callback=self.parse, dont_filter=True)

I have tried the way below, but there is a problem that when the Internet is unstable, It will stop and will break the loop.

from scrapy.http import Request

    class YourSpider(BaseSpider):
    # ...spider init details...
        def parse(self, response):
            # ...process item...
            yield item           
            yield Request(response.url, callback=self.parse)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜