开发者

Combining these two Regular Expressions into one

I have the following in C#:

public static bool IsAlphaAndNumeric(string s)
{
    return Regex.IsMatch(s, @"[a-zA-Z]+") 
        && Regex.IsMatch(s, @"\d+");
}

I want to check if parameter s contains at least one alphabetical character and one digit and I wrote the above method to do so.

But is there a way I can combine the two regular expressions ("[a-zA-Z]+" and "\d+"开发者_如何学编程) into one ?


For C# with LINQ:

return s.Any(Char.IsDigit) && s.Any(Char.IsLetter);


@"^(?=.*[a-zA-Z])(?=.*\d)"

 ^  # From the begining of the string
 (?=.*[a-zA-Z]) # look forward for any number of chars followed by a letter, don't advance pointer
 (?=.*\d) # look forward for any number of chars followed by a digit)

Uses two positive lookaheads to ensure it finds one letter, and one number before succeding. You add the ^ to only try looking forward once, from the start of the string. Otherwise, the regexp engine would try to match at every point in the string.


You could use [a-zA-Z].*[0-9]|[0-9].*[a-zA-Z], but I'd only recommend it if the system you were using only accepted a single regex. I can't imagine this would be more efficient than two simple patterns without alternation.


Its not exactly what you want but let say i have more time. Following should work faster than regex.

    static bool IsAlphaAndNumeric(string str) {
        bool hasDigits = false;
        bool  hasLetters=false;

        foreach (char c in str) {
            bool isDigit = char.IsDigit(c);
            bool isLetter = char.IsLetter(c);
            if (!(isDigit | isLetter))
                return false;
            hasDigits |= isDigit;
            hasLetters |= isLetter;
        }
        return hasDigits && hasLetters;
    }

Why its fast let check it out. Following is the test string generator. It generate 1/3 of set completly correct string and 2/3 ad incorrect. In 2/3 1/2 is all alphs and other half is all digits.

    static IEnumerable<string> GenerateTest(int minChars, int maxChars, int setSize) {
        string letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
        string numbers = "0123456789";            
        Random rnd = new Random();
        int maxStrLength = maxChars-minChars;
        float probablityOfLetter = 0.0f;
        float probablityInc = 1.0f / setSize;
        for (int i = 0; i < setSize; i++) {
            probablityOfLetter = probablityOfLetter + probablityInc;
            int length = minChars + rnd.Next() % maxStrLength;
            char[] str = new char[length];
            for (int w = 0; w < length; w++) {
                if (probablityOfLetter < rnd.NextDouble())
                    str[w] = letters[rnd.Next() % letters.Length];
                else 
                    str[w] = numbers[rnd.Next() % numbers.Length];                    
            }
            yield return new string(str);
        }
    }

Following is darin two solution. One has compiled and other is noncompiled version.

class DarinDimitrovSolution
{
    const string regExpression = @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$";
    private static readonly Regex _regex = new Regex(
        regExpression, RegexOptions.Compiled);

    public static bool IsAlphaAndNumeric_1(string s) {
        return _regex.IsMatch(s);
    }
    public static bool IsAlphaAndNumeric_0(string s) {
        return Regex.IsMatch(s, regExpression);
    }

Following is the main of the test loop

    static void Main(string[] args) {

        int minChars = 3;
        int maxChars = 13;
        int testSetSize = 5000;
        DateTime start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            IsAlphaNumeric(testStr);
        }
        Console.WriteLine("My solution : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_0(testStr);
        }
        Console.WriteLine("DarinDimitrov  1 : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_1(testStr);
        }
        Console.WriteLine("DarinDimitrov(compiled) 2 : {0}", (DateTime.Now - start).ToString());

        Console.ReadKey();
    }

Following is results

My solution : 00:00:00.0170017    (Gold)
DarinDimitrov  1 : 00:00:00.0320032  (Silver medal) 
DarinDimitrov(compiled) 2 : 00:00:00.0440044   (Gold)

So the first solution was the best. Some more result in release mode and following spec

   int minChars = 20;
   int maxChars = 50;
   int testSetSize = 100000;

My solution : 00:00:00.4060406
DarinDimitrov  1 : 00:00:00.7400740
DarinDimitrov(compiled) 2 : 00:00:00.3410341 (now that very fast)

I checked again with RegexOptions.IgnoreCase flag. rest of param same as above

My solution : 00:00:00.4290429 (almost same as before)
DarinDimitrov  1 : 00:00:00.9700970 (it have slowed down )
DarinDimitrov(compiled) 2 : 00:00:00.8440844 ( this as well still fast but look at .3 in last result)

After gnarf mention that there was a problem with my algo it was checking if string only consist of letter and digits so i change it and now it check that string show have atleast one char and one digit.

    static bool IsAlphaNumeric(string str) {
        bool hasDigits = false;
        bool hasLetters = false;

        foreach (char c in str) {
            hasDigits |= char.IsDigit(c);
            hasLetters |= char.IsLetter(c);
            if (hasDigits && hasLetters)
                return true;
        }
        return false;
    }

Results

My solution : 00:00:00.3900390 (Goody Gold Medal)
DarinDimitrov  1 : 00:00:00.9740974 (Bronze Medal)
DarinDimitrov(compiled) 2 : 00:00:00.8230823 (Silver)

Mine is fast by a big factor.


private static readonly Regex _regex = new Regex(
    @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$", RegexOptions.Compiled);

public static bool IsAlphaAndNumeric(string s)
{
    return _regex.IsMatch(s);
}

If you want to ignore case you could use RegexOptions.Compiled | RegexOptions.IgnoreCase.


The following is not only faster than the other lookahead constructs, it is also (in my eyes) closer to the requirements:

[a-zA-Z\d]((?<=\d)[^a-zA-Z]*[a-zA-Z]|[^\d]*\d)

On my (admittedly crude test) it runs in about half the time required by the other regex solutions, and has the advantage that it will not care about newlines in the input string. (And if for some reason it should, it is obvious how to include it).

Here is how (and why) it works:

Step 1: It matches a single character (let us call it c) that is a number or a letter.
Step 2: It does a lookbehind to check if c is a number. If so:
Step 2.1: It allows an unlimited number of characters that are not a letter, followed by a single letter. If this matches, we have a number (c) followed by a letter.
Step 2.2: If c is not a number, it must be a letter (otherwise it would not have been matched). In this case we allow an unlimited number of non-digits, followed by a single digit. This would mean we have a letter (c) followed by a number.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜