pascal-like string literal regular expression
I'm trying to match pascal string literal input to the following pattern: @"^'([^']|(''))*'$", but that's not working. What is wrong with the pattern?
public void Run()
{             
    using(StreamReader reader = new StreamReader(String.Empty))
    {
        var LineNumber = 0;
        var LineContent = String.Empty;
        while(null != (LineContent = reader.ReadLine()))
        {
            LineNumber++;
            String[] InputWords = new Regex(@"\(\*(?:\w|\d)*\*\)").Replace(LineContent.TrimStart(' '), @" ").Split(' ');
            foreach(String word in InputWords)
            {
                Scanner.Scan(word);
            }
        }
    }
}
I search input string for any pascal-comment entry, replace it with whitespace, then I split input into substrings to match them to the following:
private void Initialize()
{
    MatchingTable = new Dictionary<TokenUnit.TokenType, Regex>();
    MatchingTable[TokenUnit.TokenType.Identifier] = new Regex
    (
        @"^[_a-zA-Z]\w*$",
        RegexOptions.Compiled | RegexOptions.Singleline
    );
    MatchingTable[TokenUnit.TokenType.NumberLiteral] = new Regex
    (
        @"(?:^\d+$)|(?:^\d+\.\d*$)|(?:^\d*\.\d+$)",
         RegexOptions.Compiled | RegexOptions.Singleline
    );
}
// ... Here it all comes together
public TokenUnit Scan(String input)
{                         
    foreach(KeyValuePair<TokenUnit.TokenType, Regex> node in this.MatchingTable)
    {
        if(node.Value.IsMatch(input))
        {
            return new TokenUnit
            {
                Type = node.Key 开发者_如何转开发                       
            };
        }
    }
    return new TokenUnit
    {
        Type = TokenUnit.TokenType.Unsupported
    };
}
The pattern appears to be correct, although it could be simplified:
^'(?:[^']+|'')*'$
Explanation:
^      # Match start of string
'      # Match the opening quote
(?:    # Match either...
 [^']+ # one or more characters except the quote character
 |     # or
 ''    # two quote characters (= escaped quote)
)*     # any number of times
'      # Then match the closing quote
$      # Match end of string
This regex will fail if the input you're checking it against contains anything besides a Pascal string (say, surrounding whitespace).
So if you want to use the regex to find Pascal strings within a larger text corpus, then you need to remove the ^ and $ anchors.
And if you want to allow double quotes, too, then you need to augment the regex:
^(?:'(?:[^']+|'')*'|"(?:[^"]+|"")*")$
In C#:
foundMatch = Regex.IsMatch(subjectString, "^(?:'(?:[^']+|'')*'|\"(?:[^\"]+|\"\")*\")$");
This regex will match strings like
'This matches.'
'This too, even though it ''contains quotes''.'
"Mixed quotes aren't a problem."
''
It won't match strings like
'The quotes aren't balanced or escaped.'
There is something 'before or after' the quotes.
    "Even whitespace is a problem."
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论