Trying to Use an ItemExporter in Scrapy

2023-03-15 06:19 问答作者：

I'm trying to implement some sort of Item Exporter in my code. My basic code is right now to scrape si.com for batting averages, just as an example. The results are presented in one long row, and I'd like to modify the output as it's stored in the .csv file to put it in a column instead. Below I'm including the spider, and the item exporter I'm using is just the basic one found here. What I really want to have happen is take each item and store the results in columns next to each other instead of one long row with all three results consecutively.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import XmlItemExporter

from mlb1.items import MlbItem

class MLBSpider(BaseSpider):
   name = "si.com"
   allowed_domains = ["si.com"]
   start_urls = [
       http://sportsillustrated.cnn.com/baseball/mlb/stats/2011/batting/ml_0_byBATTING_AVG.html"
       ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//div[@class="cnnSASD_sport-mlb"]/div[@class="cnnSASD_page-leadersPlayersExpandedStats"]/div[@class="cnnStatsContent"]')
       items = []
       for site in sites:
           item = MlbItem()
           item['name'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol1"]//text()').extract()
           item['team'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnCol2"]//text()').extract()
           item['batave'] = site.select('//table[@class="cnnSASD_first"]/*/td[@class="cnnColHighlight"]//text()').extract()
   开发者_如何学编程        items.append(item)
       return items

I'm still very new at Python Coding so the scrapy documentation isn't much help. When I try running the code, I get an error of, "ImportError: Error loading object 'mlb1.pipelines.XmlExportPipeline': cannot import name signals". Any help anyone can provide would be greatly appreciated.

See this example for extracting player names

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    player_names = hxs.select('//table[@class="cnnSASD_first"]//td[@class="cnnCol1"]/a')
    for p_name in player_names:
        l = XPathItemLoader(item=MlbItem(), selector=p_name )
        l.add_xpath('name', 'text()')
        yield l.load_item()

In scrapy command line, use --set FEED_URI=items.csv --set FEED_FORMAT=csv . This will dump your names to items.csv file. No need to write your feed exporter. You can model your xpath for team names on similar lines

继续阅读：scrapy

Trying to Use an ItemExporter in Scrapy

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？