开发者

Why isn't there a regular expression standard?

I know there is the perl regex that i开发者_如何学Pythons sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?


There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE (Perl Compatible Regular Expression) with respect to greedy matching and what not.


Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.


Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.

Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.


I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.

For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.

Differences like that make this one particularly hard to standardize.


Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.

They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.


Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".


Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.

Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜