Is there a more "Pythonic" way to combine CSV elements?

2023-03-16 10:14 问答作者：

Basically I am using a python cron to read data from the web and place it in a CSV list in the form of:

.....
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$

.....

My code is to basically do a regex search and itterate through all the matches between ### and $$$, then go through each match line by line, taking each line and splitting by commas. As you can see some entries have 4 commas, some have 5. Th开发者_如何学Goat is because I was dumb and didn't realize the web source puts commas in it's 4 digit numbers. IE

entry1,36,257.21,16.15,16.168

is suposed to really be

entry1,36257.21,16.15,16.168

I already collected a lot of data and do not want to rewrite, so I thought of a cumbersome workaround. Is there a more pythonic way to do this?

===

contents = ifp.read()

#Pull all entries from the market data
for entry in re.finditer("###(.*\n)*?\$\$\$",contents):

    dataSet = contents[entry.start():entry.end()]
    dataSet = dataSet.split('\n');

    timeStamp = dataSet[0][3:]
    print timeStamp

    for i in xrange(1,8):
        splits = dataSet[i].split(',')
        if(len(splits) == 5):
            remove = splits[1]
            splits[2] = splits[1] + splits[2]
            splits.remove(splits[1])
        print splits
        ## DO SOME USEFUL WORK WITH THE DATA ##

===

I'd use Python's csv module to read in the CSV file, fix the broken rows as I encountered them, then use csv.writer to write the CSV back out. Like so (assuming your original file, with commas in the wrong place, is ugly.csv, and the new, cleaned up output file will be pretty.csv):

import csv

inputCsv = csv.reader(open("ugly.csv", "rb"))
outputCsv = csv.writer(open("pretty.csv", "wb"))

for row in inputCsv:
  if len(row) >= 5:
    row[1] = row[1] + row[2] #note that csv entries are strings, so this is string concatenation, not addition
    del row[2]
  outputCsv.writerow(row)

Clean and simple, and, since you're using the proper CSV parser and writer, you shouldn't have to worry about introducing any new weird corner cases (if you had used this in your first script, parsing web results, your commas in your input data would have been escaped).

Normally the csv module is used to handle CSV files of all formats.

However here you have this ugly situation with the commas, so an ugly hack is appropriate. I don't see a clean solution to this, so I think it's OK to go with whatever works.

Incidentally, this line seems to be redundant:

remove = splits[1]

Others have suggested that you use csv to parse the file, and that's good advice. But it does not directly address the other issue -- namely, that you're dealing with a file that consists of sections of data. By slurping the file into a single string and then using regex to parse that big string, you are throwing away a key point of leverage on the file. A different strategy is to write a method that can parse the file, yielding a section at a time.

def read_next_section(f):
    for line in f:
        line = line.strip()
        if line.startswith('#'):
            # Start of a new section.
            ts = line[3:]
            data = []
        elif line.startswith('$'):
            # End of a section.
            yield ts, data
        else:
            # Probably a good idea to use csv, as others recommend.
            # Also, write a method to deal with extra-comma problem.
            fields = line.split(',')
            data.append(fields)

with open(sys.argv[1]) as input_file:
    for time_stamp, section in read_next_section(input_file):
        # Do stuff.

A more pythonic way to write this block of code

for i in xrange(1,8):
    splits = dataSet[i].split(',')
    if(len(splits) == 5):
        remove = splits[1]
        splits[2] = splits[1] + splits[2]
        splits.remove(splits[1])
    print splits

would be

for row in dataSet:
    name, data = row.split(',', 1)
    print [name] + data.rsplit(',', 2)

继续阅读：csv list python

Is there a more "Pythonic" way to combine CSV elements?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？