Count how many reads in a data file are in an interval from reference file. Python

2023-03-31 20:18 问答作者：

I am trying to count the number of hits a value in one file(column) falls between an interval from another file (two columns).

I am completely stuck on how to map it.

I tried something like this:

for line in file1:
    if line[0]=line2[0] and line2[1]<line[1]<line2[2]:
    print line

I'm not sure if this is correct.

file 1:
elem1     39887
elem1     72111

file 2:
elem1     1     57898
elem1     57899 69887
elem2     69888 82111

In file1 elem1 is an element in my project. the value 39887 is the start coord开发者_运维知识库inate.

In file2 elem1 is still an element in my project, but the values are start and end coordinates. File2 is only a reference file.

For every line in file2, I want to see if the "elem#"=="elem#" in file 1. If the elem# in file1 is equal to elem# in file2, then I want to continue in this loop and see if the corresponding value in file1 is between the start and end positions in file2.

For instance, in the first line of file1, elem1==elem1 in the first line of file2. Since they are equal, is 39887 between 1 and 57898? Yes it is, therefore count it. I need to do this for every line in file2.

In the end, I want to see how many elements are within each group of coordinates from file2.

Assuming your lines match up one-to-one (so you want to test whether the value on the first line of one file lies between the values on the first line of the other, second line to second line, etc), you can zip the two files to iterate over them in step:

with open(...) as interval_file, open(...) as value_file:
    for value, interval in zip(interval_file, value_file):
        left, right = map(int, interval.split())
        if float(left) <= float(value) <= float(right):
            #do stuff

Drop the concepts of 'files' for a second and think about the data.

You have two groups of textual data, one that is one column and one that is two columns, correct? Assume for a second you can work out separating the text in two colums, what you really have is three lists (after converting the strings to ints lets say):

c1 = [random.randint(0,100) for i in range(100)]     
c2 = [random.randint(0,100) for i in range(100)]
c3 = [random.randint(0,100) for i in range(100)]

If I understand, you want to count the interval hits of the data in c1 in c2 and c3, correct? Now focus on what a 'hit' is. If you have 3 in c1, and you have [1,3,5,5,3,10] in c2, how many hits is that? Only 3's? The interval between 1,3,5? Or the interval of 1,3,5,5,3? Or all the above.

As a simple example, with the randoms int lists above, this prints every int in c1 that occurs both in c2 and c3:

for i in c1:
    if i in c2 and i in c3:
        print i

Once you further define what a 'hit' is, this basic structure will work. Once you have the basic data and the 'hit' structure working, then go back and deal with the files. Should be easy then.

Edit: If I understand what you are trying to do (and that is a massive if), this is a framework:

with open("file2.txt") as val_file:
    for val_line in val_file:
        val_elems=val_line.split()
        with open("file1.txt") as int_file:
            for int_line in int_file:
                int_elems=int_line.split()
                if (int_elems[0] == val_elems[0] and 
                    int_elems[1] > val_elems[1] and
                    int_elems[1] < val_elems[2]):
                        print val_line

Running against your sample data, the output: elem1 1 57898

It is not clear to me if you are trying to 1) positionally comparing the two files line by line or 2) if you are reading each line of file 2 and comparing to each and every line of file 1. The example here does the later.

继续阅读：intervals mapping python

Count how many reads in a data file are in an interval from reference file. Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？