开发者

Python Scrapy: How to get CSVItemExporter to write columns in a specific order

In Scrapy, I have my items specified in a certain order in items.py, & my spider has those items again in the same order. However, when I run the spider & save the results as a csv, the开发者_如何转开发 column order from the items.py or the spider is not maintained. How can I get the CSV to show columns in a specific order. Example code would be very appreciated.

Thanks.


This is related to Modifiying CSV export in scrapy

The problem is that the exporter is instantiated without any keyword parameters, so the keywords like EXPORT_FIELDS are ignored. The solution is the same: you need to subclass the CSV item exporter to pass the keyword parameters.

Following the above recipe, I created a new file xyzzy/feedexport.py (change "xyzzy" to whatever your scrapy class is named):

"""
The standard CSVItemExporter class does not pass the kwargs through to the
CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
(EXPORT_EMPTY is not used by CSV).
"""

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class CSVkwItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')

        super(CSVkwItemExporter, self).__init__(*args, **kwargs)

and then added it into xyzzy/settings.py:

FEED_EXPORTERS = {
    'csv': 'xyzzy.feedexport.CSVkwItemExporter'
}

Now the CSV exporter will honor the EXPORT_FIELD setting - also add to xyzzy/settings.py:

# By specifying the fields to export, the CSV export honors the order
# rather than using a random order.
EXPORT_FIELDS = [
    'field1',
    'field2',
    'field3',
]


I wouldn't know about the time you asked your question but Scrapy now provides a fields_to_export attribute to the BaseItemExporter class, from which CsvItemExporter inherits. As per version 0.22:

fields_to_export

A list with the name of the fields that will be exported, or None if you want to export all fields. Defaults to None.

Some exporters (like CsvItemExporter) respect the order of the fields defined in this attribute.

See also the documentation for BaseItemExporter and CsvItemExporter on the Scrapy website.

In order to use this feature, though, you will have to create your own ItemPipeline, as detailed in this answer

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜