How can I count the line number between two character in a file with python?
Hi I'm new to python and I have a 3.2 python! I have a file which has some sort of format like this:
Number of segment pairs = 108570; number of pairwise comparisons = 54234
'+' means given segment; '-' means reverse complement
Overlaps Containments No. of Con开发者_JS百科straints Supporting Overlap
******************* Contig 1 ********************
E_180+
E_97-
******************* Contig 2 ********************
E_254+
E_264+ is in E_254+
E_276+
******************* Contig 3 ********************
E_256-
E_179-
I want to count the number of non-empty lines between the *****contig#**** and I want to get a result like this
contig1=2
contig2=3
contig3=2**
Probably, it's best to use regular expressions here. You can try the following:
import re
str = open(file).read()
pairs = re.findall(r'\*+ (Contig \d+) \*+\n([^*]*)',str)
pairs
is a list of tuples, where the tuples have the form ('Contig x', '...')
The second component of each tuple contains the text after the mark
Afterwards, you could count the number of '\n'
in those texts; most easily this can be done via a list comprehension:
[(contig, txt.count('\n')) for (contig,txt) in pairs]
(edit: if you don't want to count empty lines you can try:
[(contig, txt.count('\n')-txt.count('\n\n')) for (contig,txt) in pairs]
)
def give(filename):
with open(filename) as f:
for line in f:
if 'Contig' in line:
category = line.strip('* \r\n')
break
cnt = 0
aim = []
for line in f:
if 'Contig' in line:
yield (category+'='+str(cnt),aim)
category = line.strip('* \r\n')
cnt = 0
aim= []
elif line.strip():
cnt+=1
if 'is in' in line:
aim.append(line.strip())
yield (category+'='+str(cnt),aim)
for a,b in give('input.txt'):
print a
if b: print b
result
Contig 1=2
Contig 2=3
['E_264+ is in E_254+']
Contig 3=2
The function give()
isn't a normal function, it is a generator function. See the doc, and if you have question, I will answer.
strip()
is a function that eliminates characters at the beginning and at the end of a string
When used without argument, strip()
removes the whitespaces (that is to say \f
\n
\r
\t
\v
and blank space
). When there is a string as argument, all the characters present in the string argument that are found in the treated string are removed from the treated string. The order of characters in the string argument doesn't matter: such an argument doesn't designates a string but a set of characters to be removed.
line.strip()
is a means to know if there are characters that aren't whitespaces in a line
The fact that elif line.strip():
is situated after the line if 'Contig' in line:
, and that it is written elif and not if, is important: if it was the contrary, line.strip()
would be True for line being for exemple
******** Contig 2 *********\n
I suppose that you will be interested to know the content of the lines like this one:
E_264+ is in E_254+
because it is this kind of line that make a difference in the countings
So I edited my code in order that the function give()
produce also the information of these kind of lines
精彩评论