开发者

cleanup nested list

I have a huge mess of a nested list that looks something like this, just longer:

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]

Ultimately I want something that looks like this:

neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberr开发者_开发知识库y, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]

but I'm not sure how to deal with the double quotes in the quotes and how to split the fruits from the numbers, especially with the commas separating some of the fruits. I've tried a bunch of things, but everything just seems to make it even more of a mess. Any suggestions would be greatly appreciated.


Use the csv module (in the standard library) to handle the double-quoted fruits with commas in their names:

import csv
import io

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]

# flatten the list of lists into a string:
data='\n'.join(item[0].strip() for item in fruit_mess)    
reader=csv.reader(io.BytesIO(data))
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader]

print(neat_fruit)    
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]


One more simple solution:

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
for i,x in enumerate(fruit_mess):
    data = x[0].rstrip('\n').rsplit(',', 2)
    fruit_mess[i] = [data[0], int(data[1]), float(data[2])]


A regex-based solution:

>>> import re
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)')
>>> neat_fruit = []
>>> for item in fruit_mess:
...     match = regex.match(item[0])
...     result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))]
...     neat_fruit.append(result)
...
>>> neat_fruit
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry,
 banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜