开发者

Parsing line with delimiter in Python

I have lines of data which I want to parse. The data looks like this:

a score=216 expect=1.05e-06
a score=180 expect=0.0394

What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.

However this function of mine doesn't seem to work:

def scoreEvalFromMaf(mafLines):
    for word in mafLines[0]:
     开发者_开发技巧   if word.startswith("score="):
            theScore = word.split('=')[1]
            theEval  = word.split('=')[2]
            return [theScore, theEval]
    raise Exception("encountered an alignment without a score")

Please advice what's the right way to do it?


It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines():

def scoreEvalFromMafLine(mafLine):
    theScore, theEval = None, None
    for word in mafLine.split():
        if word.startswith("score="):
            theScore = word.split('=')[1]
        if word.startswith("expect="):
            theEval  = word.split('=')[1]

    if theScore is None or theEval is None:
        raise Exception("Invalid line: '%s'" % line)

    return (theScore, theEval)

The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.


If mafLines if a list of lines, and you want to look just at the first one, .split that line to obtain the words. For example:

def scoreEvalFromMaf(mafLines):
    theScore = None
    theEval = None
    for word in mafLines[0].split:
        if word.startswith('score='):
            _, theScore = word.partition('=')
        elif word.startswith('expect='):
            _, theEval = word.partition('=')
    if theScore is None:
        raise Exception("encountered an alignment without a score")
    if theEVal is None:
        raise Exception("encountered an alignment without an eval")
    return theScore, theEval

Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to

    return int(theScore), float(theEval)

and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.


Obligatory and possibly inappropriate regexp solution:

import re
def scoreEvalFromMaf(mafLines):
    return [re.search(r'score=(.+) expect=(.+)', line).groups()
            for line in mafLines]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜