Incremental Stream Parsing in C++
I am reading data from a (serial) port using a non-blocking read() function (in C/C++). This means the data comes-in in chunks of undetermined (but reported) sizes (incl. 0) each time I "poll" the port. I then need to parse this "stream" for certain patterns (not XML).
My naive implementation concatenates the new string to the previous stream-string each time read() returns a non-zero buffer, and re-parses the whole string. When a pattern is matched, the relevant part is discarded leaving only the tail of the string for the next time.
Obviously there are much more efficient way to 开发者_高级运维do this, for example, incremental parsing a-la SAX, deque-like buffers or similar string slices etc. Also, obviously, I am not the first to have to do this type of stream parsing.
Does anyone know of any library that already does this sort of thing? Preventing memory-overflow in case of missing pattern matches would also be a big plus.
Thanks, Adi
You can do some tricks depending on your pattern.
- Looking for one character like a newline you only need to scan the new portion of the string.
- If you are looking for \r\n then you only need to scan the new portion starting with the last character of the previous portion.
- If you have a pattern with a known ending part then you only have to scan for that ending to know if you need to scan the whole buffer.
- If you have some kind of synchronizing character, like semicolon in C/C++, then you can parse or declare a parse error whenever you find one.
精彩评论