How can I have inside my spider something that will fetch some URL to extract something from a page via HtmlXPathSelector? But the URL is something I want to supply as a string inside the code, not a
Im new in web crawling. I\'m going to build a search engine which the crawler saves Rapidshare links including URL where that Rapidshare links found...
I have an XPath with which I\'m trying to match meta tags that have a name attribute with a value that contains the word \'keyword\' irrespective of case. Basically, I\'m trying to match:
I am a complete newbie to Python and Scrapy so I started by trying to replicate the tutorial.I am trying to scrape the www.dmoz.org website as per the tutorial.
Is it possible to access my django models inside of a Scrapy pipeline, so that I can save my scraped data straight to my model?
I just got scrapy setup and running and it works great, but I have two (noob) questions.I should say first that I am totally new to scrapy and spidering sites.
I\'m still a newcomer to python, so I hope this question isn\'t inane. The more I google for web scraping solutions, the more confused I become (unable to see a forest, despite investigating many tr
I am having some trouble with a scrapy pipeline. My information is being scraped form sites ok and the process_item method is being called correctly. However the spider_opened and spider_closed method
I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py:
I construct the following FormRequest ac开发者_C百科cording to httpFox(Firefox addon)\'s content. However, web server alway returns\"500 Internal Server Error\".