开发者

Parsing Complex Text File with C#

I need to parse a text file that has a lot of levels and characters. I've been trying different ways to parse it but I haven't been able to get anything to work. I've included a sample of the text file I'm dealing with. Any suggestions on how I can parse this file?

I have denoted the parts of the file I need with TEXTINEED.

(bean name:
       'TEXTINEED
       context:
       (list '/text
             '/content/home/left-nav/text
             '/content/home/landing-page)
       type:
       '/text/types/text
       module:
       '/modules/TEXTINEED
       source:
       '|moretext|
       ((contents
          (list (list (bean type:
                             '/directory/TEXTINEED
                             ((directives
                                (bean ((chartSize (list 600 400))
                                        (showCorners (list #f))
                                        (showColHeader (list #f))
                                        (showRowHeader (list #f)))))))
                      (bean type:
                      开发者_运维百科       '/directory/TEXTINEED
                             ((directives
                                (bean ((displayName (list "MTD"))
                                        (showCorners (list #f))
                                        (showColHeader (list #f))
                                        (showRowLabels (list #f))
                                        (hideDetailedLink (list #t))
                                        (showRowHeader (list #f))
                                        (chartSize (list 600 400)))))))
                      (bean type:
                             '/directory/TEXTINEED
                             ((directives
                                (bean ((displayName (list "QTD"))
                                        (showCorners (list #f))
                                        (showColHeader (list #f))
                                        (showRowLabels (list #f))
                                        (hideDetailedLink (list #t))
                                        (showRowHeader (list #f))
                                        (chartSize (list 600 400))))))))


it looks like you have stumbled upon a nice S-Expression file, also know as LISP code. It does look complex but its actually pretty easy to parse. In fact if you wan't to learn a lot about Lisp you could follow these blog posts, a small part of it is writing a parser for files like this. But thats probably overkill for you. :)

instead you should use an already available S-Expression parser, here's project that has a lisp interpreter for .NET, you should be able to either use their code or their project to parse the file.

The lispy thing to do would be to just read the file as a lisp program so instead of 'parsing' it you would just execute it. So another option would be to just write a small lisp program to transform the file into something else thats a little more natural in C# (maybe XML?).

for reference here's another post that talks about lisp in C#

EDIT

here is a scheme interpreter written in c (its only about 1000 loc) you are interested in the read and associated procedures. this uses a very simple forward only parse of an sexpression into a tree of c structs, you should be able to adapt this into C# no problem.


You might consider writing a state machine implementation which changes states according to the different tokens you encounter within the file. I have found state-based parsers to be quite easy to write and debug. The most difficult part would likely be defining the tokens you use.


Use a parser generator like ANTLR. It takes a EBNF-like description of the grammar and creates parser code in the language of your choice.


One approach is to just start with a helper parsing like the one described at http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class. And then process the file character by character. This is what I've done for several classes.


I wrote an S-Expression parser for C# using OMeta#. It is available at https://github.com/databigbang/SExpression.NET

Looking at your S-Expression variant you just need to change my definition of string with opening and ending double quotes to a single quote and add the definition for elements that contains a colon in the end (I assume that are dictionaries).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜