Alternatives to Stateful Parsing

2023-04-07 04:39 问答作者：

I don't like stateful parsing. It seems to me there should be a better approach. Is there?

Let me illustrate by example. Let's say I'm parsing a text file (YAML in this case, but it could be plain-text or XML. I'm making a simple trivia game; my game will contain sets of questions, each with two or more answers. In YAML, I might structure my file like:

set:
  name: math questions
  question:
    text: 1 + 1 = ?
      answer: 3
      answer: 4
      best-answer: 2
  question:
    text: 2 * 3 = ?
      answer: 5
      best-answer: 6
set:
  name: chemistry questions
  question:
    text: the valence of a Chlorine free radical is?
      answer: 1
      answer: 0
      best-answer: -1
  question:
    text: Xeon is a noble gas
      best-answer: true
      answer: false

(I haven't used YAML in a while, apologies if it's pseudo-YAML.) When I'm parsing, if I read the current line and see "answer: ...," I have to know that I'm in the answer of a question of a set.

This tends to be very stateful code, like:

if (currentLine starts with "answer")
  currentQuestion.addAnswer(...)
else if (currentLine starts with "question")
  currentQuestion = new question
...

At any point in the parsing, we need a reference to the current object, which may be nested within several other objects.

Part of the problem might be that my main loop iterates over each line, line by line. An alternative approach开发者_开发知识库 might be to just read the line, and depending on what it is, read several more lines as necessary.

So again, my question: is there a stateless way to parse data? I have a feeling that an approach might exist that would be more clear and easier to read/understand/code than my usual stateful for-loops over all lines of text.

You're apparently looking not for a "stateless" parsing, but for a non-imperative, pure functional parsing. Of course there is always a state of a sort, but with a functional approach your state is entirely implicit.

Take a look at "Functional pearls: monadic parsing in Haskell" article, and check out various Parsec-like parsing combinators libraries, which exist for even so very imperative languages like Java and C++.

What you describe is a more or less state machine driven parsing approach: you iterate over lines of the file, and a state variable keeps track of where in the parse tree you are. You might find it easier and cleaner to use recursive descent parsing, in which much of the state is implicit, in the form of the program stack. As others point out, parsing is inherently stateful, but recursive descent lets you keep less state explicitly.

An alternative approach might be to just read the line, and depending on what it is, read several more lines as necessary.

You just described "given a certain state, do something." That is, a stateful approach.

So again, my question: is there a stateless way to parse data?

Parsing is inherently stateful. The meaning of the data depends on the context. The context is the state.

There's a reason that an introductory course in compilers starts with finite-state machines.

The very concept of parsing implies that some pieces are one type of token, others are another, and others aren't valid at all. How are you going to know that without maintaining some kind of state that says "ok, i'm parsing a foo right now...this is what i should have here"?

继续阅读：algorithm language-agnostic state

Alternatives to Stateful Parsing

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？