How to determine the "tipping point" especially when programming regex's?

2022-12-11 02:21 问答作者：

G'day,

Edit: While this question covers a situation that can arise in programming a lot, i have always noticed that there is a point when working with regex's, esp. in Perl and with shell-programming, where 开发者_运维知识库trying to capture those last few edge cases:

requires much, much more time to expand your regex, which can mean
excessive complexity in the regex, which leads to
future maintenance headaches due to complex nature of regex, especially where it's not in Perl so that there's no nice /x option to allow you to document the regex fragments easily.

I was answering this question "Is there a fairly simple way for a script to tell (from context) whether “her” is a possessive pronoun?" and part of my answer was that you get to a point where chasing the last few percent of edge cases is not worth the extra effort and time to extend your regex, shell script, etc. It becomes easier to just flag the edge cases and go through them manually.

It got me wondering do people have a simple way of realising that they're hitting this type of tipping point? Or is it something that only comes with experience?

BTW While this other question is also about "tipping points", it concerns when to decide to start automating file manipulations and not when "enough is enough".

Whenever I feel that my regex or shell script crafting task takes about the same time that I would spend doing things manually, I know that I have reached the "tipping point".

Then if it's a quick and dirty tool for a bigger task, I proceed as you describe: most of the work with regex/script and edge cases flagged and manually handled.

If this is something which may be reused (e.g. in automatic regression tests) I take time for enhancing my tool (splitting tasks or switching to perl) and/or making sure that inputs conform to some assumptions.

Most regex engines allow you to document the regex in-line. If they don't, there are often techniques available to make them readable. I'm going to ignore that part of the question and assume the regex can be adequately documented.

I think the issue is not so much the complexity of a regex a it is the appropriateness of a regex. A regex can be long and complex, but if it's appropriate for the problem, then a non-regex solution is going to be at least as complex, and certainly much longer.

The issue is when regex is being abused to solve another type of problem. Heavy use of look-arounds are often indicative of this. If it's easier to follow a sequence of regular code that solves the same problem in a straight-forward manner, then that's the right solution, no matter how short the regex would be.

继续阅读：complexity-theory regex shell

How to determine the "tipping point" especially when programming regex's?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？