Better way to do string filtering/manipulation
mystring = '14| "Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2"|minor'
So I have 3 elements (id, message and level) divided by pipe ("|"). I want to get each element so I have written these little functions:
def get_msg(i):
x = i.split("|")
return x[1].strip().replace('"','')
def get_level(i):
x = i.split("|")
return x[2].strip()
#testing
print get_msg(mystring ) # Missing Input PID, PID: 20 : Port 4 of a static component
print g开发者_StackOverflow社区et_level(mystring )# major
Right now it works well but I feel like this is not pythonic way to solve it, how could these 2 functions can be improved? Regular expression feels like fitting here but I'm very naive at it so couldn't apply.
I think the most pythonic way is to use the csv module. From PyMotW with delimiter option:
import csv
import sys
f = open(sys.argv[1], 'rt')
try:
reader = csv.reader(f, delimiter='|')
for row in reader:
print row
finally:
f.close()
lst = msg.split('|')
level = lst[2].strip()
message = lst[1].strip(' "')
you're splitting your string twice which is a bit of a waste, other than that modification is minor.
class MyParser(object):
def __init__(self, value):
self.lst = value.split('|')
def id(self):
return self.lst[0]
def level(self):
return self.lst[2].strip()
def message(self):
return self.lst[1].strip(' "')
I think the best practice would be to actually have a better formatted string, or not use a string for that. Why is it a string? Where are you parsing this from? A database? Xml? Can the origin be altered?
{ 'id': 14, 'message': 'foo', 'type': 'minor' }
A datatype like this I think would be a best practice, if it's stored in a database then split it up in multiple columns.
Edit: I'm probably going to get stoned for this because it's probably overkill/inefficient but if you add lots of sections later on you could store these in a nice hash map:
>>> formatParts = {
... 'id': lambda x: x[0],
... 'message': lambda x: x[1].strip(' "'),
... 'level': lambda x: x[2].strip()
... }
>>> myList = mystring.split('|')
>>> formatParts['id'](myList)
'14'
>>> formatParts['message'](myList)
'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2'
>>> formatParts['level'](myList)
'minor'
If you don't need the getter functions, this should work nicely:
>>> m_id,msg,lvl = [s.strip(' "') for s in mystring.split('|')]
>>> m_id,msg,lvl
('14', 'Preprocessor Frame Count Not Incrementing; Card: Motherboard, Port: 2',
'minor')
Note: avoid shadowing built-in function 'id'
精彩评论