Regular expression library for .Net that supports lazy evaluation
I'm looking for a regular expression library in .Net that supports lazy evaluation.
Note: I'm specifically looking for lazy evaluation (i.e., the library, instead of immediately returning all matches in a document, only consumes as much of the document as necessary to determine the next match per request), NOT support for lazy quantifiers - though if it also supports lazy quantifiers, I wouldn't object!
Specific details: I want to be able to run regexes against very large documents with potentially hund开发者_如何学编程reds of thousands of regex matches, and iterate across the results using IEnumerable<>
semantics, without having to take the up-front cost of finding all matches.
Ideally FOSS in C#, but the only requirement is usability from a .Net 3.5 app.
The Match class' NextMatch
method should meet your needs:
Returns a new Match with the results for the next match, starting at the position at which the last match ended (at the character after the last matched character).
A quick look at it in Reflector confirms this behavior:
public Match NextMatch()
{
if (this._regex == null)
{
return this;
}
return this._regex.Run(false, base._length, base._text, this._textbeg,
this._textend - this._textbeg, this._textpos);
}
Check out the linked MSDN reference for an example of its usage. Briefly, the flow would resemble:
Match m = rx.Match(input);
while (m.Success)
{
// do work
m = m.NextMatch();
}
Are you sure the built-in Regex
class doesn't do this? For example, the Match.NextMatch()
method would suggest that it's continuing from where it got to...
I believe that if you call Regex.Match
it will stop at the first match it comes to, and then continue from there when you call NextMatch
.
精彩评论