开发者

Alternatives for regex in Python

Regular expressions are highly unreadable and difficult to debug. Does there exist any replacement for text processing which could be handled by mere mortals?

Criteria include

  • It's a library or a tool (please point the answer to the library itself)

  • Human readable syntax (no cheatsheets needed)

  • Documentation with examples

  • Able to debug expressions

If possible can you mention language specific and language independent solutions. I am mainly developing on Python, but I'd hope to see a library which could be ported to other languages/platforms.

I once read that Haskell would have nice text processing capabilities, but again, this is a built-in language solution, not a generic solution.

Edit: Please do not give answers "regular expres开发者_运维百科sions are not bad, do like this!" Stackoverflow.com is not a place for subjective opinions, but I think a regular expressions are bad and I want to see my alternative options for using them.


I know this post was old, but people might be benefit from this question/answers. VerbalExpressions is still using regex behind the scene, but in a friendly way.

Intro: http://thechangelog.com/stop-writing-regular-expressions-express-them-with-verbal-expressions/ Python fork: https://github.com/VerbalExpressions


you could use the re.VERBOSE flag:

charref = re.compile(r"""
 &[#]                # Start of a numeric entity reference
 (
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
 )
 ;                   # Trailing semicolon
""", re.VERBOSE)


pyparsing offers another method to create and execute (simple) grammars. I've been using it in a project for parsing different kind of log files and the use was rather simple and somewhat more intuitive than with regexps.


Take a look at Ned Batchelder's list of python parsing tools


LPeg is a Lua library and not a Python one I am afraid, but it might have been ported by someone. Either way, it is open-source so you could port it if you wanted to yourself. It has a somewhat different approach to text-matching than regular expressions do, and as such I find it has a considerable learning curve. However, where efficiency is concerned it has the potential to out-perform regular expressions - but obviously, such a statement depends strongly on the testcase and ones ability in both languages.


If you're concerned about understanding and debugging others' regex, there are translational tools that make them more easily understandable. My favorite is RegExBuddy on Windows. On Mac, RegExRx in the AppStore is helpful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜