Match regex at exact offset
I want to check if a certain pattern (eg. a double quoted string) matches at an exact position.
开发者_JAVA百科Example
string text = "aaabbb";
Regex regex = new Regex("b+");
// Now match regex at exactly char 3 (offset) of text
I'd like to check if regex
matches at exactly char 3.
Regex.Match Method (String, Int32)
but it does not behave like I expected.
So I did some tests and some workarounds:
public void RegexTest2()
{
Match m;
string text = "aaabbb";
int offset = 3;
m = new Regex("^a+").Match(text, 0); // lets do a sanity check first
Assert.AreEqual(true, m.Success);
Assert.AreEqual("aaa", m.Value); // works as expected
m = new Regex("^b+").Match(text, offset);
Assert.AreEqual(false, m.Success); // this is quite strange...
m = new Regex("^.{"+offset+"}(b+)").Match(text); // works, but is not very 'nice'
Assert.AreEqual(true, m.Success);
Assert.AreEqual("bbb", m.Groups[1].Value);
m = new Regex("^b+").Match(text.Substring(offset)); // works too, but
Assert.AreEqual(true, m.Success);
Assert.AreEqual("bbb", m.Value);
}
In fact I'm starting to believe that new Regex("^.", 1).Match(myString)
will never match anything.
Any suggestions?
Edit:
I got a working solution (workaround). So my question is all about speed and a nice implementation.
Have you tried what the docs say?
If you want to restrict a match so that it begins at a particular character position in the string and the regular expression engine does not scan the remainder of the string for a match, anchor the regular expression with a \G (at the left for a left-to-right pattern, or at the right for a right-to-left pattern). This restricts the match so it must start exactly at startat.
i.e. replace the ^
with a \G
:
m = new Regex(@"\\Gb+").Match(text, offset);
Assert.AreEqual(true, m.Success); // should now work
You expect Match(text, offset)
to start evaluate the searched string as if it were starting at the offset. This is not so. ^
will actually evaluate to offset 0
, not offset
!
So use the overload of Match that will evaluate ^
to offset
:
m = new Regex("^bbb$").Match(text, offset, text.Length-offset);
another option would be to use but it is slower than the one above:
m = new Regex("^.{"+offset+"}bbb$").Match(text);
or this (the first method is the fastest):
m = new Regex(@"\Gbbb$").Match(text, offset);
You can add a positive lookbehind assertion ((?<=...)
) to your regex:
Regex regex = new Regex("(?<=\A.{3})b+");
This ensures that there are exactly three characters (.{3}
) after the start of the string (\A
) and before the start of the regex. You can also use ^
instead of \A
, but since the former can also mean (in some circumstances) "Match at the start of a line", the latter is a bit more explicit.
You might need to compile the regex using RegexOptions.Singleline
to allow the dot to also match newline characters if that's a requirement.
By the way,
m = new Regex("^b+").Match(text, 3);
doesn't work because ^
matches at the start of the line, and the position before the first b
is, of course, not at the start of the line.
精彩评论