cleanup nested list
I have a huge mess of a nested list that looks something like this, just longer:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
Ultimately I want something that looks like this:
neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberr开发者_开发知识库y, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]
but I'm not sure how to deal with the double quotes in the quotes and how to split the fruits from the numbers, especially with the commas separating some of the fruits. I've tried a bunch of things, but everything just seems to make it even more of a mess. Any suggestions would be greatly appreciated.
Use the csv
module (in the standard library) to handle the double-quoted fruits with commas in their names:
import csv
import io
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
# flatten the list of lists into a string:
data='\n'.join(item[0].strip() for item in fruit_mess)
reader=csv.reader(io.BytesIO(data))
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader]
print(neat_fruit)
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
One more simple solution:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
for i,x in enumerate(fruit_mess):
data = x[0].rstrip('\n').rsplit(',', 2)
fruit_mess[i] = [data[0], int(data[1]), float(data[2])]
A regex-based solution:
>>> import re
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)')
>>> neat_fruit = []
>>> for item in fruit_mess:
... match = regex.match(item[0])
... result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))]
... neat_fruit.append(result)
...
>>> neat_fruit
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry,
banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
精彩评论