开发者

When should I not use regular expressions?

After some research I figured that it is not possible to parse recursive structures (such as HTML or XML) using regular expressions. Is it possible to co开发者_StackOverflowmprehensively list out day to day coding scenarios where I should avoid using regular expressions because it is just impossible to do that particular task using regular expressions? Let us say the regex engine in question is not PCRE.


Don't use regular expressions when:

  • the language you are trying to parse is not a regular language, or
  • when there are readily available parsers specifically made for the data you are trying to parse.

Parsing HTML and XML with regular expressions is usually a bad idea both because they are not regular languages and because libraries already exist that can parse it for you.

As another example, if you need to check if an integer is in the range 0-255, it's easier to understand if you use your language's library functions to parse it to an integer and then check its numeric value instead of trying to write the regular expression that matches this range.


I'll plagiarize myself from my blog post, When to use and when not to use regular expressions...

Public websites should not allow users to enter regular expressions for searching. Giving the full power of regex to the general public for a website's search engine could have a devastating effect. There is such a thing as a regular expression denial of service (ReDoS) attack that should be avoided at all costs.

HTML/XML parsing should not be done with regular expressions. First of all, regular expressions are designed to parse a regular language which is the simplest among the Chomsky hierarchy. Now, with the advent of balancing group definitions in the .NET flavor of regular expressions you can venture into slightly more complex territory and do a few things with XML or HTML in controlled situations. However, there's not much point. There are parsers available for both XML and HTML which will do the job more easily, more efficiently, and more reliably. In .NET, XML can be handled the old XmlDocument way or even more easily with Linq to XML. Or for HTML there's the HTML Agility Pack.

Conclusion

Regular expressions have their uses. I still contend that in many cases they can save the programmer a lot of time and effort. Of course, given infinite time & resources, one could almost always build a procedural solution that's more efficient than an equivalent regular expression.

Your decision to abandon regex should be based on 3 things:

1.) Is the regular expression so slow in your scenario that it has become a bottleneck?

2.) Is your procedural solution actually quicker & easier to write than the regular expression?

3.) Is there a specialized parser that will do the job better?


My rule of thumb is, use regular expressions when no other solution exists. If there's already a parser (for example, XML, HTML) or you're just looking for strings rather than patterns, there's no need to use regular expressions.

Always ask yourself "can I solve this without using regular expressions?". The answer to that question will tell you whether you should use regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜