开发者

How to determine the "tipping point" especially when programming regex's?

G'day,

Edit: While this question covers a situation that can arise in programming a lot, i have always noticed that there is a point when working with regex's, esp. in Perl and with shell-programming, where 开发者_运维知识库trying to capture those last few edge cases:

  • requires much, much more time to expand your regex, which can mean
  • excessive complexity in the regex, which leads to
  • future maintenance headaches due to complex nature of regex, especially where it's not in Perl so that there's no nice /x option to allow you to document the regex fragments easily.

I was answering this question "Is there a fairly simple way for a script to tell (from context) whether “her” is a possessive pronoun?" and part of my answer was that you get to a point where chasing the last few percent of edge cases is not worth the extra effort and time to extend your regex, shell script, etc. It becomes easier to just flag the edge cases and go through them manually.

It got me wondering do people have a simple way of realising that they're hitting this type of tipping point? Or is it something that only comes with experience?

BTW While this other question is also about "tipping points", it concerns when to decide to start automating file manipulations and not when "enough is enough".


Whenever I feel that my regex or shell script crafting task takes about the same time that I would spend doing things manually, I know that I have reached the "tipping point".

Then if it's a quick and dirty tool for a bigger task, I proceed as you describe: most of the work with regex/script and edge cases flagged and manually handled.

If this is something which may be reused (e.g. in automatic regression tests) I take time for enhancing my tool (splitting tasks or switching to perl) and/or making sure that inputs conform to some assumptions.


Most regex engines allow you to document the regex in-line. If they don't, there are often techniques available to make them readable. I'm going to ignore that part of the question and assume the regex can be adequately documented.

I think the issue is not so much the complexity of a regex a it is the appropriateness of a regex. A regex can be long and complex, but if it's appropriate for the problem, then a non-regex solution is going to be at least as complex, and certainly much longer.

The issue is when regex is being abused to solve another type of problem. Heavy use of look-arounds are often indicative of this. If it's easier to follow a sequence of regular code that solves the same problem in a straight-forward manner, then that's the right solution, no matter how short the regex would be.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜