How to locate a sequence of values (specifically, bytes) within a larger collection in .NET
I need to parse the bytes from a file so that I only take the data after a certain sequence of bytes has been identified. For example, if the sequence is simply 0xFF (one byte), then I can use LINQ on the collection:
byte[] allBytes = new byte[] {0x00, 0xFF, 0x01};
var importantBytes = allBytes.SkipWhile(byte b => b != 0xFF);
// importantBytes = {0xFF, 0x01}
But is there an elegant way to detect a 开发者_JAVA百科multi-byte sequence - e.g. 0xFF, 0xFF - especially one that backtracks in case it starts to get a false positive match?
I'm not aware of any built-in way; as per usual, you can always write your own extension method. Here's one off the top of my head (there may be more efficient ways to implement it):
public static IEnumerable<T> AfterSequence<T>(this IEnumerable<T> source,
T[] sequence)
{
bool sequenceFound = false;
Queue<T> currentSequence = new Queue<T>(sequence.Length);
foreach (T item in source)
{
if (sequenceFound)
{
yield return item;
}
else
{
currentSequence.Enqueue(item);
if (currentSequence.Count < sequence.Length)
continue;
if (currentSequence.Count > sequence.Length)
currentSequence.Dequeue();
if (currentSequence.SequenceEqual(sequence))
sequenceFound = true;
}
}
}
I'll have to check to make sure that this is correct, but it should give you the basic idea; iterate through the elements, track the last sequence of values retrieved, set a flag when the sequence is found, and once the flag is set, start returning each subsequent element.
Edit - I did run a test, and it does work correctly. Here's some test code:
static void Main(string[] args)
{
byte[] data = new byte[]
{
0x01, 0x02, 0x03, 0x04, 0x05,
0xFF, 0xFE, 0xFD, 0xFC, 0xFB, 0xFA
};
byte[] sequence = new byte[] { 0x02, 0x03, 0x04, 0x05 };
foreach (byte b in data.AfterSequence(sequence))
{
Console.WriteLine(b);
}
Console.ReadLine();
}
If you convert your bytes into a string, you can take advantage of the myriad of searching functions built into that, even if the bytes you're working with aren't actually characters in the traditional sense.
Just as a bit of theory; this is a regular language problem. You may be able to use a regular expression engine to detect it. The first google hit for "regular expression on stream" found
http://codeguru.earthweb.com/columns/experts/article.php/c14689
精彩评论