I have to parse a complicated string format. Is implementing an automaton a sensible approach?

2023-01-18 22:47 问答作者：

I am currently struggling with a particularly obnoxious string format that I have to parse. The strings can contain substrings that denote a variable property that has to be resolved. Imagine something like "ThisExampleStringContainsA[VARIABLE_PROPERTY]". Also, these properties can be arbitrarily nested and also they can have different meanings, dependending on context. If [VARIABLE_PROPERTY] is in fact not a valid name of a variable (which of course has to be decided at runtime), it just becomes a normal part of the entire string and remains unchanged and verbatim. Followingly, there are no invalid strings, as the number of opening square brackets does not need to match the number of closing brackets! This]Is[A[Valid]]][ExampleToo!. There are more rules, but this will give you an idea.

So, at the moment I am unsure how to approach this. My first tries have ended in an incredible mess of ifs and elses and I noticed more and more that the solution should propably incorporate some sort of state concept. Now, I am thinking more and more about using an automaton to do this. However, I have encountered automatons only as pure theoretical constructs. I never came across an actual implementation. Furthermore, automatons are traditionally used to validate a word, i.e. determining if it belongs to a formally defined language. Needless to say, it is difficult for me to come up with a formal definition of that language.

How would you approach this? Do you think actually implementing an automaton is a sane approach? How would you model this from an OO design point of view? The project is in C#, if that makes any difference. Would you suggest something entirely different?

/Edit: My description may have been a bit misleading, here are some more details: The problem for me is to find the properties in the right order (from innermost to outermost). Once you have identified the next property to resolve, the actual substitution with its final value is relatively easy.

Let's take the example from above and I 'll give you a step by step example of what should happen. The full input string is: This]Is[A[Valid]]][ExampleToo! The first closing bracket and the last opening bracket are just normal characters, as they don't enclose anything. The same goes for all characters that are not between a matching bracket pair. That leaves us with the part [A[Valid]]]. The innermost property has to be resolved first, that would be [Valid]. The brackets just enclose the property identifying string, so Valid is the name of the property we are about to resolve. Let's say, this string does in fact identify a property and it gets replaced with its actual value, let's say Foo. The identifying string including the brackets gets replaced, so [Valid] becomes Foo. Now, we have to look at [AFoo]]. Let's pretend AFoo does NOT identify a property, that leaves the substring unchanged (including the brackets). Finally, the second closing bracket after AFoo has no matching opening bracket and is therefore also just a character. After processing 开发者_StackOverflowis complete, the entire string would read: This]Is[AFoo]][ExampleToo!

I hope this example makes things a bit more clear. Please keep in mind, that I have simplified the string format here! This is just to give you an idea, what difficulties I am facing. I don't expect working code, I am looking for answers that give me ideas on how to approach the problem. Since this parsing has to be done for many thousands of strings the solution must have a somewhat reasonable performance.

How about plain old recursion? Seems like a good fit here.

继续阅读：automaton string-parsing

I have to parse a complicated string format. Is implementing an automaton a sensible approach?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？