开发者

Match regex at exact offset

I want to check if a certain pattern (eg. a double quoted string) matches at an exact position.

开发者_JAVA百科

Example

string text = "aaabbb";
Regex regex = new Regex("b+");
// Now match regex at exactly char 3 (offset) of text

I'd like to check if regex matches at exactly char 3.

I had a look at the Regex.Match Method (String, Int32) but it does not behave like I expected.

So I did some tests and some workarounds:

public void RegexTest2()
{
    Match m;
    string text = "aaabbb";
    int offset = 3;

    m = new Regex("^a+").Match(text, 0); // lets do a sanity check first
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("aaa", m.Value);  // works as expected

    m = new Regex("^b+").Match(text, offset);
    Assert.AreEqual(false, m.Success);  // this is quite strange...

    m = new Regex("^.{"+offset+"}(b+)").Match(text); // works, but is not very 'nice'
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("bbb", m.Groups[1].Value);

    m = new Regex("^b+").Match(text.Substring(offset)); // works too, but 
    Assert.AreEqual(true, m.Success);
    Assert.AreEqual("bbb", m.Value);
}

In fact I'm starting to believe that new Regex("^.", 1).Match(myString) will never match anything.

Any suggestions?

Edit:

I got a working solution (workaround). So my question is all about speed and a nice implementation.


Have you tried what the docs say?

If you want to restrict a match so that it begins at a particular character position in the string and the regular expression engine does not scan the remainder of the string for a match, anchor the regular expression with a \G (at the left for a left-to-right pattern, or at the right for a right-to-left pattern). This restricts the match so it must start exactly at startat.

i.e. replace the ^ with a \G:

m = new Regex(@"\\Gb+").Match(text, offset);
Assert.AreEqual(true, m.Success);  // should now work


You expect Match(text, offset) to start evaluate the searched string as if it were starting at the offset. This is not so. ^ will actually evaluate to offset 0, not offset!

So use the overload of Match that will evaluate ^ to offset:

m = new Regex("^bbb$").Match(text, offset, text.Length-offset);

another option would be to use but it is slower than the one above:

m = new Regex("^.{"+offset+"}bbb$").Match(text);

or this (the first method is the fastest):

m = new Regex(@"\Gbbb$").Match(text, offset);


You can add a positive lookbehind assertion ((?<=...)) to your regex:

Regex regex = new Regex("(?<=\A.{3})b+");

This ensures that there are exactly three characters (.{3}) after the start of the string (\A) and before the start of the regex. You can also use ^ instead of \A, but since the former can also mean (in some circumstances) "Match at the start of a line", the latter is a bit more explicit.

You might need to compile the regex using RegexOptions.Singleline to allow the dot to also match newline characters if that's a requirement.

By the way,

m = new Regex("^b+").Match(text, 3);

doesn't work because ^ matches at the start of the line, and the position before the first b is, of course, not at the start of the line.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜