开发者

Trying to Use an ItemExporter in Scrapy

I'm trying to implement some sort of Item Exporter in my code. My basic code is right now to scrape si.com for batting averages, just as an example. The results are presented in one long row, and I'd like to modify the output as it's stored in the .csv file to put it in a column instead. Below I'm including the spider, and the item exporter I'm using is just the basic one found here. What I really want to have happen is take each item and store the results in columns next to each other instead of one long row with all three results consecutively.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter

from mlb1.items import MlbItem

class MLBSpider(BaseSpider):
   name = "si.com"
   allowed_domains = ["si.com"]
   start_urls = [
       http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
       ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
       items = []
       for site in sites:
           item = MlbItem()
           item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
           item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
           item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
   开发者_如何学编程        items.append(item)
       return items

I'm still very new at Python Coding so the scrapy documentation isn't much help. When I try running the code, I get an error of, "ImportError: Error loading object 'mlb1.pipelines.XmlExportPipeline': cannot import name signals". Any help anyone can provide would be greatly appreciated.


See this example for extracting player names

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    player_names = hxs.select('//table[@class="cnnSASD_first"]//td[@class="cnnCol1"]/a')
    for p_name in player_names:
        l = XPathItemLoader(item=MlbItem(), selector=p_name )
        l.add_xpath('name', 'text()')
        yield l.load_item()

In scrapy command line, use --set FEED_URI=items.csv --set FEED_FORMAT=csv . This will dump your names to items.csv file. No need to write your feed exporter. You can model your xpath for team names on similar lines

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜