开发者

Partial regular expression match

I have a regular expre开发者_高级运维ssion that I'm testing a input stream of characters. I wonder if there is a way to match the regular expression against the input and determine if it is a partial match that consumes the entire input buffer? I.e. the end of the input buffer is reached before the regexp completes. I would like the implementation to decide whether to wait for more input characters, or abort the operation.

In other words, I need to determine which one is true:

  1. The end of the input buffer was reached before the regexp was matched

    E.g. "foo" =~ /^foobar/

  2. The regular expression matches completely

    E.g. "foobar" =~ /^foobar/

  3. The regular expression failed to match

    E.g. "fuubar" =~ /^foobar

The input is not packetized.


Is this the scenario you are solving? You are waiting for a literal string, e.g. 'foobar'. If the user types a partial match, e.g. 'foo', you want to keep waiting. If the input is a non-match you want to exit.

If you are working with literal strings I would just write a loop to to test the characters in sequence. Or,

If (input.Length < target.Length && target.StartsWith(input))
   // keep trying

If you are trying to match more complex regular expressions, I don't know how to do this with regular expressions. But I would start by reading more about how the platform implements regular expressions.

tom


I'm not sure if this is your question but.
Regular expressions either match or not. And the expression will match a variable amount of input. So, it can't be determined directly.

However, it is possible, if you believe there is a possibility of overlap, to use a smart buffering scheme to accomplish the same thing.

There are many ways to do this.

One way is to match all that does not match via assertions, up until you get the start of a match (but not the full match you seek). These you simple throw away and clear from your buffer. When you get a match you seek, clear the buffer of that data and data before it.

Example: /(<function.*?>)|([^<]*)/ The part you throw away/clear from the buffer is in group 2 capture buffer.

Another way is if you are matching finite length strings, if you don't match anything in the buffer, you can safely throw away all from the beginning of the buffer to the end of the buffer minus the length of the finite string you are searching for.

Example: Your buffer is 64k in size. You are searching for a string of length 10. It was not found in the buffer. You can safely clear (64k - 10) bytes, retaining the last 10 bytes. Then append (64k-10) bytes to the end of the buffer. Of course you only need a buffer of size 10 bytes, constantly removing/adding 1 character but a larger buffer is more efficient and you could use thresholds to reload more data.

If you can create a buffer that easily contracts/expands, more buffering options are available.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜