开发者

What is regularity?

This is more of a computer science question than a programming one, but I figure that this is the best place out of all the related sites to ask this.

When I discovered Regular Expressions and looked up the term I assumed that this property of "regularity" refers to the fact that the expression's language has a definable structural pattern. However, in reading ab开发者_如何学运维out the subject and the theory behind this I learned that there are kinds of languages that are not regular, and yet from the way they are defined it's clear that a pattern can be matched to them. One such language is (a^n)(b^n). Clearly this is a pattern, and yet this is not a regular language. So now I'm left wondering what is it about regular languages that makes them regular, and this language not?


Intuitively explaining computer science is... tricky. I'll give it a shot, but keep in mind that some of this is going to be "close enough" but not theoretically rigorous.

A regular language is one that can be decided by a machine that is computational equivalent to a finite automata (DFA/NDFA). A finite automata can be thought of as a machine that operates purely in states, no storage. So you can see that anbn cannot be regular as it requires a machine that can count the number of a's and b's (and thus must have infinite* storage capacity) in order to compare them.

For comparison, (abc)n is regular, because the number of repetitions is irrelevant.

For a more rigorous (and correspondingly denser view) check the wikipedia article and linked pages.

*The infinite doesn't matter here, but I mention it for completeness. It might be easier to think of it as "luckily, always just enough" storage.


The etymology of the name comes from Kleene's 1950s work describing regular sets using his mathematical notation created for the purpose. See this.


Perhaps the Wikipedia article on regular languages can explain it better than we can. However, I'll give it a shot.

From a theoretical standpoint, a regular language (set of strings) is one that can be generated using a finite state automaton. In programmer terms, this is equivalent to saying it can be generated using regular expressions. Thus, all finite languages (sets of strings) are regular, but there are some infinite languages, such as anbn (the language of all strings of n a's followed by n b's) that cannot be recognized using a FSA or regular expressions. There are more powerful computational devices (such as modern computers, which are modeled using Turing Machines) which can recognize those languages.

The reason regular expressions are used so much in programming for string searching is that they can recognize the large majority of strings that are important to us programmers, and at the same time can be implemented to search very quickly using finite state automata.


The word regular in regular expression refers to the Mathematical concept of regular, not the English concept. Just like how the word prime in mathematics bear little relation to prime beef.

It's inherited by CS (which is a branch of mathematics) to refer to a more specific concept: http://en.wikipedia.org/wiki/Regular_language


regular expression are not really regular, the name is etymological.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜