How to create a dictionary from a line of text?

2023-01-28 17:23 问答作者：

I have a generat开发者_开发知识库ed file with thousands of lines like the following:

CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001

Some lines have more fields and others have fewer, but all follow the same pattern of key-value pairs and each line has a TSN field.

When doing some analysis on the file, I wrote a loop like the following to read the file into a dictionary:

#!/usr/bin/env python

from sys import argv

records = {}
for line in open(argv[1]):
    fields = line.strip().split(',')
    record = dict(zip(fields[::2], fields[1::2]))
    records[record['TSN']] = record

print 'Found %d records in the file.' % len(records)

...which is fine and does exactly what I want it to (the print is just a trivial example).

However, it doesn't feel particularly "pythonic" to me and the line with:

dict(zip(fields[::2], fields[1::2]))

Which just feels "clunky" (how many times does it iterate over the fields?).

Is there a better way of doing this in Python 2.6 with just the standard modules to hand?

In Python 2 you could use izip in the itertools module and the magic of generator objects to write your own function to simplify the creation of pairs of values for the dict records. I got the idea for pairwise() from a similarly named (although functionally different) recipe in the Python 2 itertools docs.

To use the approach in Python 3, you can just use plain zip() since it does what izip() did in Python 2 resulting in the latter's removal from itertools — the example below addresses this and should work in both versions.

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

Which can be used like this in your file reading for loop:

from sys import argv

records = {}
for line in open(argv[1]):
    fields = (field.strip() for field in line.split(','))  # generator expr
    record = dict(pairwise(fields))
    records[record['TSN']] = record

print('Found %d records in the file.' % len(records))

But wait, there's more!

It's possible to create a generalized version I'll call grouper(), which again corresponds to a similarly named itertools recipe (which is listed right below pairwise()):

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

Which could be used like this in your for loop:

    record = dict(grouper(2, fields))

Of course, for specific cases like this, it's easy to use functools.partial() and create a similar pairwise() function with it (which will work in both Python 2 & 3):

import functools
pairwise = functools.partial(grouper, 2)

Postscript

Unless there's a really huge number of fields, you could instead create a actual sequence out of the pairs of line items (rather than using a generator expression which has no len()):

fields = tuple(field.strip() for field in line.split(','))

The advantage being that it would allow the grouping to be done using simple slicing:

try:
    xrange
except NameError:  # Python 3
    xrange = range

def grouper(n, sequence):
    for i in xrange(0, len(sequence), n):
        yield sequence[i:i+n]

pairwise = functools.partial(grouper, 2)

Not so much better as just more efficient...

Full explanation

import itertools

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

record = dict(grouper(2, line.strip().split(","))

source

If we're going to abstract it into a function anyway, it's not too hard to write "from scratch":

def pairs(iterable):
    iterator = iter(iterable)
    while True:
        try: yield (iterator.next(), iterator.next())
        except: return

robert's recipe version definitely wins points for flexibility, though.

继续阅读：parsing python

How to create a dictionary from a line of text?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？