Trying to Use an ItemExporter in Scrapy
I'm trying to implement some sort of Item Exporter in my code. My basic code is right now to scrape si.com for batting averages, just as an example. The results are presented in one long row, and I'd like to modify the output as it's stored in the .csv file to put it in a column instead. Below I'm including the spider, and the item exporter I'm using is just the basic one found here. What I really want to have happen is take each item and store the results in columns next to each other instead of one long row with all three results consecutively.
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter
from mlb1.items import MlbItem
class MLBSpider(BaseSpider):
name = "si.com"
allowed_domains = ["si.com"]
start_urls = [
http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
items = []
for site in sites:
item = MlbItem()
item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
开发者_如何学编程 items.append(item)
return items
I'm still very new at Python Coding so the scrapy documentation isn't much help. When I try running the code, I get an error of, "ImportError: Error loading object 'mlb1.pipelines.XmlExportPipeline': cannot import name signals". Any help anyone can provide would be greatly appreciated.
See this example for extracting player names
def parse(self, response):
hxs = HtmlXPathSelector(response)
player_names = hxs.select('//table[@class="cnnSASD_first"]//td[@class="cnnCol1"]/a')
for p_name in player_names:
l = XPathItemLoader(item=MlbItem(), selector=p_name )
l.add_xpath('name', 'text()')
yield l.load_item()
In scrapy command line, use --set FEED_URI=items.csv --set FEED_FORMAT=csv
. This will dump your names to items.csv
file. No need to write your feed exporter. You can model your xpath for team names on similar lines
精彩评论