Parsing line with delimiter in Python
I have lines of data which I want to parse. The data looks like this:
a score=216 expect=1.05e-06
a score=180 expect=0.0394
What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.
However this function of mine doesn't seem to work:
def scoreEvalFromMaf(mafLines):
for word in mafLines[0]:
开发者_开发技巧 if word.startswith("score="):
theScore = word.split('=')[1]
theEval = word.split('=')[2]
return [theScore, theEval]
raise Exception("encountered an alignment without a score")
Please advice what's the right way to do it?
It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines()
:
def scoreEvalFromMafLine(mafLine):
theScore, theEval = None, None
for word in mafLine.split():
if word.startswith("score="):
theScore = word.split('=')[1]
if word.startswith("expect="):
theEval = word.split('=')[1]
if theScore is None or theEval is None:
raise Exception("Invalid line: '%s'" % line)
return (theScore, theEval)
The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.
If mafLines
if a list of lines, and you want to look just at the first one, .split
that line to obtain the words. For example:
def scoreEvalFromMaf(mafLines):
theScore = None
theEval = None
for word in mafLines[0].split:
if word.startswith('score='):
_, theScore = word.partition('=')
elif word.startswith('expect='):
_, theEval = word.partition('=')
if theScore is None:
raise Exception("encountered an alignment without a score")
if theEVal is None:
raise Exception("encountered an alignment without an eval")
return theScore, theEval
Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to
return int(theScore), float(theEval)
and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.
Obligatory and possibly inappropriate regexp solution:
import re
def scoreEvalFromMaf(mafLines):
return [re.search(r'score=(.+) expect=(.+)', line).groups()
for line in mafLines]
精彩评论