How can I split a 2D array into an array with unique values and a dictionary?

2023-03-23 05:01 问答作者：

I'm trying to split a 2D array into a specific format and can't figure out the last step. A sample of my data is structured as follows:

# Original Data
fileListCode = [['Seq3.xls', 'B08524_057'], 
                ['Seq3.xls', 'B08524_053'], 
                ['Seq3.xls', 'B08524_054'],
                ['Seq98.xls', 'B25034_001'], 
                ['Seq98.xls', 'D25034_002'], 
                ['Seq98.xls', 'B25034_003']]

I am trying to split it up so that it looks like this:

# split into [['Seq3.xls', {'B08524_057':1,'B08524_053':2, 'B08524_054':3},
#             ['Seq98.xls',{'B25034_001':1,'D25034_002':2, 'B25034_003':3}]

The dictionary keys 1,2,3 are based on the original position of the entry, starting from the first time that the filename appears. To do this, I've first made an array to get all the unique file names (anything that is .xls is a filename)

tmpFileList = []
tmpCodeList = []
arrayListDict = []

# store unique filelist in a tempprary array:
for i in range( len(fileListCode)):
    if fileListCode[i][0] not in tmpFileList:
        开发者_开发知识库tmpFileList.append( fileListCode[i][0]  )

However, I'm struggling with the next step. I can't figure out a good way of pulling out the codenames (B08524_052 for example), and converting them into a dictionary with an index based on their position.

# make array to store filelist, and codes with dictionary values
for i in range( len(tmpFileList)):
    arrayListDict.append([tmpFileList[i], {}])

This code just produces [['Seq3.xls', {}], ['Seq98.xls', {}]] ; I'm not sure whether I should first produce the structure and then try and add the code and dictionary values in, or whether there is a better way.

-- EDIT: I just made sample a little more clear by changing the values in fileListCode

With, itertools.groupby this process will be much simplier:

>>> key = operator.itemgetter(0)
>>> grouped = itertools.groupby(sorted(fileListCode, key=key), key=key)
>>> [(i, {k[1]: n for n, k in enumerate(j, 1)}) for i, j in grouped]
[('Seq3.xls', {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3}),
 ('Seq98.xls', {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3})]

For old Python versions:

>>> [(i, dict((k[1], n) for n, k in enumerate(j, 1))) for i, j in grouped]
[('Seq3.xls', {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3}),
 ('Seq98.xls', {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3})]

But I think using dict would be better:

>>> {i: {k[1]: n for n, k in enumerate(j, 1)} for i, j in grouped}
{'Seq3.xls': {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3},
 'Seq98.xls': {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3}}

You've confused lists and dictonaries.

It would make far more sense to do something more like this:

file_list_code = [['Seq3.xls', 'B08524_052'],
                  ['Seq3.xls', 'B08524_053'],                  
                  ['Seq3.xls', 'B08524_054'],                 
                  ['Seq98.xls', 'B25034_001'],                  
                  ['Seq98.xls', 'B25034_002'],                  
                  ['Seq98.xls', 'B25034_003']] 

file_codes = {}
for name, code in file_list_code:
    if name not in file_codes:
        file_codes[name] = []
    file_codes[name].append(code)

This yields:

{'Seq3.xls': ['B08524_052', 'B08524_053', 'B08524_054'], 
'Seq98.xls': ['B25034_001', 'B25034_002', 'B25034_003']}

This could be further simplifed by using a defaultdict. It's arguably overkill for something this simple, but it's good to know about. Here's an example:

import collections

file_list_code = [['Seq3.xls', 'B08524_052'],
                  ['Seq3.xls', 'B08524_053'],                  
                  ['Seq3.xls', 'B08524_054'],                 
                  ['Seq98.xls', 'B25034_001'],                  
                  ['Seq98.xls', 'B25034_002'],                  
                  ['Seq98.xls', 'B25034_003']] 

file_codes = collections.defaultdict(list)
for name, code in file_list_code:
    file_codes[name].append(code)

fileListCode = [['Seq3.xls', 'B08524_052'],
                ['Seq3.xls', 'B08524_053'],
                ['Seq3.xls', 'B08524_054'],
                ['Seq98.xls', 'B25034_001'],
                ['Seq98.xls', 'B25034_002'],
                ['Seq98.xls', 'B25034_003']]

dico = {}
li = []
for a,b in fileListCode:

    if a in dico:
        li[dico[a]][1][b] = len( li[dico[a]][1] ) + 1


    else:
        dico[a] = len(li)
        li.append([a,{b:1}])


print '\n'.join(map(str,li))

继续阅读：arrays python

How can I split a 2D array into an array with unique values and a dictionary?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？