sorting lists of list to get unique ids for last column
I have this data saved in a file:
['5',60680,60854,'gene_id "ENS1"']
['5',59106,89211,'gene_id "ENS1"']
['5',58686,58765,'gene_id "ENS1"']
['5开发者_StackOverflow',80835,93381,'gene_id "ENS2"']
['5',55555,92223,'gene_id "ENS2"']
['5',73902,74276,'gene_id "ENS2"']
I need help with python to get an output which ensures that items in the 4th column appear only when the second column has the minimum value and the third column has a maximum value within a 4th column item. So I want my output to look like this:
['5',58686,89211,'gene_id "ENS1"']
['5',55555,93381,'gene_id "ENS2"']
Each item in the 4th column should only appear once. How can I also get rid of the [] around the data. Thank you.
>>> from itertools import groupby
>>> for i, j in groupby(lst, key=lambda x: x[3]):
t = list(zip(*j))
print(t[0][0], min(t[1]), max(t[2]), t[3][0])
5 58686 89211 gene_id "ENS1"
5 55555 93381 gene_id "ENS2"
It's not clear, what do you mean by getting rid of []
, these are just syntax for python lists.
import re
pat = re.compile("\['[^']+',([^,]+),([^,]+),'([^']+)']")
ch = '''
['5',60680,60854,'gene_id "ENS1"']
['5',59106,89211,'gene_id "ENS1"']
['5',58686,58765,'gene_id "ENS1"']
['5',80835,93381,'gene_id "ENS2"']
['5',55555,92223,'gene_id "ENS2"']
['5',73902,74276,'gene_id "ENS2"']'''
li = pat.findall(ch)
print li
deekmin = {}
deekmax = {}
for a,b,c in li[1:]:
if c in deekmin:
if a<deekmin[c]:
deekmin[c] = a
if b>deekmax[c]:
dekkmax[c] = b
else:
deekmin[c] = a
deekmax[c] = b
res = [ (deekmin[c],deekmax[c],c) for c in deekmin ]
print res
精彩评论