Is using a finite state machine a good design for general text parsing?
I am reading a file that is filled with hex numbers. I have to identify a particular pattern, say "aaad" (without quotes) from it. Every time I see the pattern, I generate some data to some other file.
This would be a very common case in designing programs - parsing and looking for a particular pattern.
I have designed it as a Finite State Machine and structured structured it in C using switch-case
to change states. This was the first implementation that occured to me.
- DESIGN: Are t开发者_Python百科here some better designs possible?
- IMPLEMENTATION: Do you see some problems with using a switch case as I mentioned?
A hand-rolled FSM can work well for simple situations, but they tend to get unwieldy as the number of states and inputs grows.
There is probably no reason to change what you have already designed/implemented, but if you are interested in general-purpose text parsing techniques, you should probably look at things like regular expressions, Flex, Bison, and ANTLR.
For embarrassingly simple cases couple of if
's or switch
'es are sufficient.
For parsing a string on POSIX systems, man regex (3). For fully featured parsing of whole files (e.g. complex configs) use Lex/Flex and Yacc/Bison.
When writing in C++, look at Boost Regex for the simpler case and Boost Spirit for more complex one. Flex & Bison work with C++ too.
精彩评论