开发者

sorting lists of list to get unique ids for last column

I have this data saved in a file:

['5',60680,60854,'gene_id "ENS1"']
['5',59106,89211,'gene_id "ENS1"']
['5',58686,58765,'gene_id "ENS1"']
['5开发者_StackOverflow',80835,93381,'gene_id "ENS2"']
['5',55555,92223,'gene_id "ENS2"']
['5',73902,74276,'gene_id "ENS2"']

I need help with python to get an output which ensures that items in the 4th column appear only when the second column has the minimum value and the third column has a maximum value within a 4th column item. So I want my output to look like this:

['5',58686,89211,'gene_id "ENS1"']
['5',55555,93381,'gene_id "ENS2"']

Each item in the 4th column should only appear once. How can I also get rid of the [] around the data. Thank you.


>>> from itertools import groupby
>>> for i, j in groupby(lst, key=lambda x: x[3]):
    t = list(zip(*j))
    print(t[0][0], min(t[1]), max(t[2]), t[3][0])


5 58686 89211 gene_id "ENS1"
5 55555 93381 gene_id "ENS2"

It's not clear, what do you mean by getting rid of [], these are just syntax for python lists.


import re
pat = re.compile("\['[^']+',([^,]+),([^,]+),'([^']+)']")

ch = '''
['5',60680,60854,'gene_id "ENS1"']
['5',59106,89211,'gene_id "ENS1"']
['5',58686,58765,'gene_id "ENS1"']
['5',80835,93381,'gene_id "ENS2"']
['5',55555,92223,'gene_id "ENS2"']
['5',73902,74276,'gene_id "ENS2"']'''

li = pat.findall(ch)
print li

deekmin = {}
deekmax = {}
for a,b,c in li[1:]:
    if c in deekmin:
        if a<deekmin[c]:
            deekmin[c] = a
        if b>deekmax[c]:
            dekkmax[c] = b
    else:
        deekmin[c] = a
        deekmax[c] = b

res = [ (deekmin[c],deekmax[c],c) for c in deekmin ]
print res
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜