开发者

Regex: 5 digits in increasing order

I need a regex for 5 digits in increasing order, like 12345, 24579, 34680开发者_运维问答, and so on.

0 comes after 9.


You can try (as seen on rubular.com)

^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$

Explanation

  • ^ and $ are the beginning and end of string anchors respectively
  • \d{5} is the digit character class \d repeated exactly {5} times
  • (?=...) is a positive lookahead
  • ? on each digit makes each optional

How it works

  • First we use lookahead to assert that anchored at the beginning of the string, we can see \d{5} till the end of the string
  • Now that we know that we have 5 digits, we simply match the digits in the order we want, but making each digit optional
    • The assertion ensures that we have the correct number of digits

regular-expressions.info

  • Anchors, Character Classes, Finite Repetition, Lookarounds, and Optional

Generalizing the technique

Let's say that we need to match strings that consists of:

  • between 1-3 vowels [aeiou]
  • and the vowels must appear in order

Then the pattern is (as seen on rubular.com):

^(?=[aeiou]{1,3}$)a?e?i?o?u?$

Again, the way it works is that:

  • Anchored at the beginning of the string, we first assert (?=[aeiou]{1,3}$)
    • So correct alphabet in the string, and correct length
  • Then we test for each letter, in order, making each optional, until the end of the string

Allowing repetition

If each digit can repeat, e.g. 11223 is a match, then:

  • instead of ? (zero-or-one) on each digit,
  • we use * (zero-or-more repetition)

That is, the pattern is (as seen on rubular.com):

^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$


Wrong tool for the job. Just iterate through the characters one by one and check it. How you would do that depends on which language you're using.

Here is how to check using C:

#include <stdio.h>
#define CHR2INT(c) c - '0'

int main(void)
{
    char *str = "12345";
    int i, res = 1;

    for (i = 1; i < 5; ++i) {
        res &= CHR2INT(str[i - 1]) < CHR2INT(str[i]) && str[i] >= '0' && str[i] <= '9';
    }

    printf("%d", res);

    return 0;
}

It is obviously longer than a regex solution, but a regex solution will never be as fast as that.


polygenelubricants's suggestion is a great one, but there's a better one and that's to use a simpler lookahead constraint given that the bulk of the RE checks for the numeric-ness of the characters anyway. For why, see this log of an interactive Tcl session:

% set RE1 "^(?=\\d{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$
% set RE2 "^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$
% time {regexp $RE1 24579} 100000
32.80587355 microseconds per iteration
% time {regexp $RE2 24579} 100000
22.598555649999998 microseconds per iteration

As you can see, it's about 30% faster to use the version of the RE with .{5}$ as a lookahead constraint, at least in the Tcl RE engine. (Note that the above log misses some lines where I was stabilizing the compilations of the regular expressions, though I'd anticipate RE2 to be a little faster to compile anyway.) If you're using a different RE engine (e.g., PCRE or Perl) then you should recheck to get your own performance figures.


This is not something that regular expressions are generally good for. The sort of regex you're going to need to acheive this is likely to be bigger and uglier than simple procedural code to do the same thing.

By all means use a regex to ensure you have five digits in your string but then just use normal coding checks to ensure the order is correct.

You don't bang in nails with a screwdriver (well, not if you're smart), so you shouldn't be trying to use regular expressions for every job either :-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜