开发者

Better way to do string filtering/manipulation

mystring = '14| "Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2"|minor'

So I have 3 elements (id, message and level) divided by pipe ("|"). I want to get each element so I have written these little functions:

    def get_msg(i):
        x = i.split("|")
        return x[1].strip().replace('"','')

    def get_level(i):
        x = i.split("|")
        return x[2].strip()
 #testing
print get_msg(mystring ) # Missing Input PID,   PID: 20 : Port 4 of a static component
print g开发者_StackOverflow社区et_level(mystring )# major

Right now it works well but I feel like this is not pythonic way to solve it, how could these 2 functions can be improved? Regular expression feels like fitting here but I'm very naive at it so couldn't apply.


I think the most pythonic way is to use the csv module. From PyMotW with delimiter option:

import csv
import sys

f = open(sys.argv[1], 'rt')
try:
    reader = csv.reader(f, delimiter='|')
    for row in reader:
        print row
finally:
    f.close()


lst = msg.split('|')
level = lst[2].strip()
message = lst[1].strip(' "')

you're splitting your string twice which is a bit of a waste, other than that modification is minor.


class MyParser(object):
    def __init__(self, value):
        self.lst = value.split('|')
    def id(self):
        return self.lst[0]
    def level(self):
        return self.lst[2].strip()
    def message(self):
        return self.lst[1].strip(' "')


I think the best practice would be to actually have a better formatted string, or not use a string for that. Why is it a string? Where are you parsing this from? A database? Xml? Can the origin be altered?

{ 'id': 14, 'message': 'foo', 'type': 'minor' }

A datatype like this I think would be a best practice, if it's stored in a database then split it up in multiple columns.

Edit: I'm probably going to get stoned for this because it's probably overkill/inefficient but if you add lots of sections later on you could store these in a nice hash map:

>>> formatParts = {
...     'id': lambda x: x[0],
...     'message': lambda x: x[1].strip(' "'),
...     'level': lambda x: x[2].strip()
... }
>>> myList = mystring.split('|')
>>> formatParts['id'](myList)
'14'
>>> formatParts['message'](myList)
'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2'
>>> formatParts['level'](myList)
'minor'


If you don't need the getter functions, this should work nicely:

>>> m_id,msg,lvl = [s.strip(' "') for s in mystring.split('|')]
>>> m_id,msg,lvl
('14', 'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2',
'minor')

Note: avoid shadowing built-in function 'id'

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜