开发者

Regular expression library for .Net that supports lazy evaluation

I'm looking for a regular expression library in .Net that supports lazy evaluation.

Note: I'm specifically looking for lazy evaluation (i.e., the library, instead of immediately returning all matches in a document, only consumes as much of the document as necessary to determine the next match per request), NOT support for lazy quantifiers - though if it also supports lazy quantifiers, I wouldn't object!

Specific details: I want to be able to run regexes against very large documents with potentially hund开发者_如何学编程reds of thousands of regex matches, and iterate across the results using IEnumerable<> semantics, without having to take the up-front cost of finding all matches.

Ideally FOSS in C#, but the only requirement is usability from a .Net 3.5 app.


The Match class' NextMatch method should meet your needs:

Returns a new Match with the results for the next match, starting at the position at which the last match ended (at the character after the last matched character).

A quick look at it in Reflector confirms this behavior:

public Match NextMatch()
{
    if (this._regex == null)
    {
        return this;
    }
    return this._regex.Run(false, base._length, base._text, this._textbeg,
        this._textend - this._textbeg, this._textpos);
}

Check out the linked MSDN reference for an example of its usage. Briefly, the flow would resemble:

Match m = rx.Match(input);
while (m.Success) 
{
    // do work
    m = m.NextMatch();
}


Are you sure the built-in Regex class doesn't do this? For example, the Match.NextMatch() method would suggest that it's continuing from where it got to...

I believe that if you call Regex.Match it will stop at the first match it comes to, and then continue from there when you call NextMatch.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜