开发者

Alternatives to Stateful Parsing

I don't like stateful parsing. It seems to me there should be a better approach. Is there?

Let me illustrate by example. Let's say I'm parsing a text file (YAML in this case, but it could be plain-text or XML. I'm making a simple trivia game; my game will contain sets of questions, each with two or more answers. In YAML, I might structure my file like:

set:
  name: math questions
  question:
    text: 1 + 1 = ?
      answer: 3
      answer: 4
      best-answer: 2
  question:
    text: 2 * 3 = ?
      answer: 5
      best-answer: 6
set:
  name: chemistry questions
  question:
    text: the valence of a Chlorine free radical is?
      answer: 1
      answer: 0
      best-answer: -1
  question:
    text: Xeon is a noble gas
      best-answer: true
      answer: false

(I haven't used YAML in a while, apologies if it's pseudo-YAML.) When I'm parsing, if I read the current line and see "answer: ...," I have to know that I'm in the answer of a question of a set.

This tends to be very stateful code, like:

if (currentLine starts with "answer")
  currentQuestion.addAnswer(...)
else if (currentLine starts with "question")
  currentQuestion = new question
...

At any point in the parsing, we need a reference to the current object, which may be nested within several other objects.

Part of the problem might be that my main loop iterates over each line, line by line. An alternative approach开发者_开发知识库 might be to just read the line, and depending on what it is, read several more lines as necessary.

So again, my question: is there a stateless way to parse data? I have a feeling that an approach might exist that would be more clear and easier to read/understand/code than my usual stateful for-loops over all lines of text.


You're apparently looking not for a "stateless" parsing, but for a non-imperative, pure functional parsing. Of course there is always a state of a sort, but with a functional approach your state is entirely implicit.

Take a look at "Functional pearls: monadic parsing in Haskell" article, and check out various Parsec-like parsing combinators libraries, which exist for even so very imperative languages like Java and C++.


What you describe is a more or less state machine driven parsing approach: you iterate over lines of the file, and a state variable keeps track of where in the parse tree you are. You might find it easier and cleaner to use recursive descent parsing, in which much of the state is implicit, in the form of the program stack. As others point out, parsing is inherently stateful, but recursive descent lets you keep less state explicitly.


An alternative approach might be to just read the line, and depending on what it is, read several more lines as necessary.

You just described "given a certain state, do something." That is, a stateful approach.

So again, my question: is there a stateless way to parse data?

Parsing is inherently stateful. The meaning of the data depends on the context. The context is the state.

There's a reason that an introductory course in compilers starts with finite-state machines.


The very concept of parsing implies that some pieces are one type of token, others are another, and others aren't valid at all. How are you going to know that without maintaining some kind of state that says "ok, i'm parsing a foo right now...this is what i should have here"?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜