What is the best way in python to get a denormalized array from this ordered array?
I have this array:
>>> print raw_data
['LEVEL 1',
'SUBJECT A',
'GROUP X',
'COMMENT i',
'COMMENT ii',
'COMMENT iii',
'GROUP Y',
'COMMENT iv',
'COMMENT v',
'COMMENT vi',
'LEVEL 2',
'SUBJECT B',
'GROUP Z',
'COMMENT vii',
'COMMENT viii',
'COMMENT ix',
'SUBJECT C',
'GROUP X2',
'COMMENT x',
'COMMENT xi',
'COMMENT xii',
'COMMENT xiii',
'GROUP Y2',
'COMMENT xiv',
'COMMENT xv',
'COMMENT xvi']
Where the obvious hierarchy is:
- Level
- Subject
- Grou开发者_Go百科p
- Comments
- Grou开发者_Go百科p
- Subject
My objective is to get the array as a denormalized array to be store on a database:
>>> print result
[
['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vi'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT x'],
['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xi'],
['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xii'],
['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xiii],'
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xi']
]
I was trying to solve this, but I am quite lost, I think this problem has to be usual, so I would like to know if someone has a efficient approach, this seems to be something like nested sets, but I don't know a lot of this on python, getting the level is easy, but I am getting " headaches" getting this further.
>>> def addlevel(a):
if a.startswith('LEVEL'):
return [1, a]
elif a.startswith('SUBJECT'):
return [2, a]
elif a.startswith('GROUP'):
return [3, a]
elif a.startswith('COMMENT'):
return [4, a]
>>> map(addlevel, raw_data)
[[1, 'LEVEL 1'],
[2, 'SUBJECT A'],
[3, 'GROUP X'],
[4, 'COMMENT i'],
[4, 'COMMENT ii'],
[4, 'COMMENT iii'],
[3, 'GROUP Y'],
[4, 'COMMENT iv'],
[4, 'COMMENT v'],
[4, 'COMMENT vi'],
[1, 'LEVEL 2'],
[2, 'SUBJECT B'],
[3, 'GROUP Z'],
[4, 'COMMENT vii'],
[4, 'COMMENT viii'],
[4, 'COMMENT ix'],
[2, 'SUBJECT C'],
[3, 'GROUP X2'],
[4, 'COMMENT x'],
[4, 'COMMENT xi'],
[4, 'COMMENT xii'],
[4, 'COMMENT xiii'],
[3, 'GROUP Y2'],
[4, 'COMMENT xiv'],
[4, 'COMMENT xv'],
[4, 'COMMENT xvi']]
I would appreciate any clues !
Pseudocode, don't have a handy python interpreter right now:
Set LEVEL, SUBJECT, GROUP to None, results to []
Loop over the list
if its a 'LEVEL', set LEVEL to it
if its a 'SUBJECT', set SUBJECT to it
if its a 'GROUP', set GROUP to it
if its a "COMMENT", append [LEVEL SUBJECT GROUP and COMMENT] to results
Ta-da.
It just relies on the ordering...
You could try something like this:
raw_data = [ 'LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i', 'COMMENT ii',
'COMMENT iii', 'GROUP Y', 'COMMENT iv', 'COMMENT v', 'COMMENT vi', 'LEVEL 2',
'SUBJECT B', 'GROUP Z', 'COMMENT vii', 'COMMENT viii', 'COMMENT ix',
'SUBJECT C', 'GROUP X2', 'COMMENT x', 'COMMENT xi', 'COMMENT xii',
'COMMENT xiii', 'GROUP Y2', 'COMMENT xiv', 'COMMENT xv', 'COMMENT xvi' ]
level, subject, group, comment = '', '', '', ''
result = []
for item in raw_data:
if item.startswith('COMMENT'):
comment = item
elif item.startswith('GROUP'):
group = item
comment = ''
elif item.startswith('SUBJECT'):
subject = item
group = ''
elif item.startswith('LEVEL'):
level = item
subject = ''
if level and subject and group and comment:
result.append([level, subject, group, comment])
import pprint
pprint.pprint(result)
Which would yield:
[['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT x'],
['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xi'],
['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xii'],
['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xiii'],
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xvi']]
精彩评论