How to match a text format to a string without regex in python?

2023-02-25 02:44 问答作者：

I am reading a file with lines of the form exemplified by

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

I saw Matlab code to read this file given by

[I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%s p= %n e=%u n=%u')

I want to read this file in Python. The only thing I know of is regex, and reading even a part of this line leads to something like

re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

w开发者_C百科hich is ugly! Is there an easier way to do this in Python?

You can make the regexp more readable by building it with escape/replace...

number = "([-+0-9.DdEe ]+)"
unit = r"\(([^)]+)\)"
t = "[X] L=XU R=XU p=X e=X n=X"
m = re.compile(re.escape(t).replace("X", number).replace("U", unit))

This looks more or less pythonic to me:

line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"

parts = (None, int, None,
         None, int, str,
         None, int, str,
         None, float,
         None, int,
         None, int)

[I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None]

print [I,L,Ls,R,Rs,p,e,n]

Pyparsing is a fallback from unreadable and fragile regex processors. The parser example below handles your stated format, plus any variety of extra whitespace, and arbitrary order of the assignment expressions. Just as you have used named groups in your regex, pyparsing supports results names, so that you can access the parsed data using dict or attribute syntax (data['Lint'] or data.Lint).

from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional

# define basic punctuation
EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]")

# numeric values
integer = Word(nums).setParseAction(lambda t : int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0]))

# id and assignment fields
idRef = LBRACK + integer("id") + RBRACK
typesep = LPAR + oneOf("D G") + RPAR
lExpr = 'L' + EQ + integer("Lint")
rExpr = 'R' + EQ + integer("Rint")
pExpr = 'p' + EQ + real("pFloat")
eExpr = 'e' + EQ + integer("Eint")
nExpr = 'n' + EQ + integer("Nint")

# accept assignments in any order, with or without leading (D) or (G)
assignment = lExpr | rExpr | pExpr | eExpr | nExpr
line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment)


# test the parser
text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"
data = line.parseString(text)
print data.dump()


# prints
# [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34]
# - Eint: 10
# - Lint: 9
# - Nint: 34
# - Rint: 14
# - id: 0
# - pFloat: 0.0347222

Also, the parse actions do the string->int or string->float conversion at parse time, so that afterward the values are already in a usable form. (The thinking in pyparsing is that, while parsing these expressions, you know that a word composed of numeric digits - or Word(nums) - will safely convert to an int, so why not do the conversion right then, instead of just getting back matching strings and having to re-process the sequence of strings, trying to detect which ones are integers, floats, etc.?)

Python does not have a scanf equivalent as stated on the re page for Python.

Python does not currently have an equivalent to scanf(). Regular expressions are generally more powerful, though also more verbose, than scanf() format strings. The table below offers some more-or-less equivalent mappings between scanf() format tokens and regular expressions.

However, you could probably build your own scanf like module using the mappings on that page.

继续阅读：python readfile regex

How to match a text format to a string without regex in python?

更多精彩内容

精彩评论

最新问答

求几款适合日常出游佩戴的戒指？最好与众不同一点！？

2500千以内的家用投影仪推荐下?只要效果好,不要求啥子牌子？

向僵尸开炮流派技能怎么选?？

绝区零音擎怎么获取?？

绝经后怎么改善子宫已经萎缩的症状？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？