PyParsing: Is this correct use of setParseAction()?

2023-01-01 20:27 问答作者：

I have strings like this:

"MSE 2110, 3030, 4102"

I would like to output:

[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]

This is my way of going about it, although I haven't quite gotten it yet:

def makeCourseList(str, location, tokens):
    print "before: %s" % tokens

    for index, course_number in enumerate(tokens[1:]):
        tokens[index + 1] = (tokens[0][0], course_number)

    print "after: %s" % tokens

course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")

course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)

This outputs:

>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)]开发者_开发百科, {})

Is this the right way to do it, or am I totally off?

Also, the output of isn't quite correct - I want course_data to emit a list of course symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}, whereas the others don't.)

This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.

from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''

def memorize(t):
    memorize.dept = t[0]

def token(t):
    return (memorize.dept,int(t[0]))

course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)

print lines.parseString(data)

Output:

[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]

Is this the right way to do it, or am I totally off?

It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).

The return value of the parseString call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:

from pyparsing import *

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

class MyParse(object):
  def __init__(self):
      self.result = None

  def makeCourseList(self, str, location, tokens):
      print "before: %s" % tokens

      dept = tokens[0][0]
      newtokens = [(dept, tokens[0][1])]
      newtokens.extend((dept, tok) for tok in tokens[1:])

      print "after: %s" % newtokens
      self.result = newtokens

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
    ).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result

this emits:

before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]

which seems to be what you require, if I read your specs correctly.

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

def get_courses(data):
    for row in data.splitlines():
        department, *numbers = row.replace(",", "").split()
        for number in numbers:
            yield department, number

This would give a generator for the course codes. A list can be made with list() if need be, or you can iterate over it directly.

Sure, everybody loves PyParsing. For easy stuff like this split is sooo much easier to grok:

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

all = []
for row in data.split('\n'):
        klass,num_l = row.split(' ',1)
        all.extend((klass,int(num)) for num in num_l.split(','))

继续阅读：parsing pyparsing python

PyParsing: Is this correct use of setParseAction()?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？