开发者

Calculating difference within lists

I have two files and the content is as follows:

alt text http://img144.imageshack.us/img144/4423/screencapture2b.png

alt text http://img229.imageshack.us/img229/9153/screencapture1c.png

Please only consider the bolded column and the red column. The remaining text is junk and unnecessary. As evident from the two files they are similar in many ways. I am trying to compare the bolded text in file_1 and file_2 (it is not bolded but hope you can make out it is the same column) and if they are different, I want to print out the red text from file_1. I achieved this by the following script:

import string
import itertools

chain_id=[]
for file in os.listdir("."):
    basename = os.path.basename(file)
    if basename.startswith("d.complex"):
        chain_id.append(basename)

for i in chain_id:
    print i
    g=codecs.open(i,  encoding='utf-8')

    f=codecs.open("ac_chain_dssp.dssp",  encoding='utf-8')
    for (x, y) in itertools.izip(g,  f): 
            if y[11]=="C":
                if y[35:38]!= "EN":
                    if y[35:38] != "OTE":
                        if x[11]=="C":
                            if x[12] != "C":
                                if y[35:38] !=x[35:38]:
                                    print x [7:10]


    g.close()
    f.close()

But the results I got were not what I expected. Now I want to modify the above code in such a way that when I compare the bolded column, if the difference between the values is more than 2, then it has to print out the results. For example, row-1 of bolded column in file_1 is 83 and in file_2 it is 84 since the difference between the two is less than two, I want it to be rejected.

Can someone help me in adding the remaining code? Cheers, Chavanak

PS: This is not homework 开发者_运维知识库:)


The direct answer to your question is to alter the last condition,
if y[35:38] !=x[35:38]: so that instead the "field" at [35:38] get converted to int (or float...) and a difference can be applied to them. Giving something like

   try:
     iy = int(y[35:38])
     ix = int(x[35:38])
   except ValueError:
     # here for whatever action is appropriate, including silent ignoring.
     print("Unexpected value for record # %s" % x[7:10])

   if abs(ix - iy) > 2:
     print(x[7:10])

More indirectly, the snippet in the question prompt the following remarks,which may in turn suggest different approaches to the problem.

  • first off, if the files are strictly "fixed format", if they are very big, and/or if nothing else is done with any of the other "fields" values found in the file, the current approach is valid and probably very efficient.
  • alternatively, the logic may be made more resilient to possible variations in the file structure etc, by parsing in the "fields" of the file, rather than addressing these as slices of a long string. Loot into the standard library's csv module for possible parser support.
  • some tests seem goofy / always true etc (like comparing a 3 characters slice to a 2 character string literal. Aside from being logically wrong, this too points to a more "parsed" solution where such logical error are more readily avoided or more obvious.


Nothing to do with your problem, but this:

        if y[11]=="C":
            if y[35:38]!= "EN":
# I don't see any "EN" or "OTE" anywhere in your sample input.
# In any case the above condition will always be true, because
# y[35:38] appears to be a 3-byte string but "EN" is a 2-byte string.
                if y[35:38] != "OTE":
                    if x[11]=="C":
                        if x[12] != "C":
                            if y[35:38] !=x[35:38]:
                                print x [7:10]

is ummmmm ...

You may wish to consider an alternative way of expression e.g.

if (x[11] == "C" == y[11]
and x[12] != "C"
and y[35:38] not in ("EN?", "OTE")
and y[35:38] != x[35:38]):
    print x[7:10]


I haven't understood your problem fully but

File 1

100 C 20.2
300 B 33.3

File 2

110 C 20.23
320 B 33.34

and you want to compare 3rd column of the two files.

lines1 = file1.readlines()
list1 = [float(line.split()[2]) for line in lines1] # list of 3rd column values

lines2 = file2.readlines()
list2 = [float(line.split()[2]) for line in lines2]

result = map(lambda x,y: x-y < 2,list1,list2)

OR

 result = [list1[i]-list2[i] for i in range(len(list1)) if list1[i] - list2[i] > 2]

Is this what you want??

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜