Tips for writing a file parser in Java? [closed]

2022-12-18 00:41 问答作者：

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 9 years ago.

开发者_如何学Go Improve this question

EDIT: I'm mostly parsing "comma-seperated values", fuzzy brought that term to my attention.

Interpreting the blocks of CSV are the main question here.

I know how to read the file into something like a String[] and some of the basic features of String, but I don't think using methods like contains() and analyzing everything character by character will work.

What are some ways I can do this in a smarter way?

Example of a line:

-barfoob: boobs, foob, "foo bar"

There's a reason that everyone assumes you're talking about XML: inventing a proprietary text-based file format requires very strong justification in the face of the maturity and easy availability of XML parsers.

And your question indicates that you have very little prior knowledge about parsers (otherwise you'd be writing an ANTLR or JavaCC grammar instead of asking this question) - which is another strong argument against rolling your own, except as a learning experience.

Since the input is "formatted similarly to HTML", then it is likely that your data is best represented using a tree-like structure, and also, it is likely that it is XML or similar to XML.

If this is the case, I propose the smartest way to parse your file is to use an XML parser.

Here are some resources you may find helpful:

A chapter on XML parsing from Sun: http://java.sun.com/developer/Books/xmljava/ch03.pdf
An article that might help you get started qucikly: http://onjava.com/pub/a/onjava/2002/06/26/xml.html

HTH

If the document is valid XML, then any of the other answers will work. If it's not, you'll have to lex.

you should look at ANTLR even if you want to write the parser yourself, ANTLR is a great alternative. Or at least look at YAML

This and digging through wikipedia for related articles will probably suffice.

I think the java.util.Scanner will help you. Have a look at http://java.sun.com/javase/6/docs/api/java/util/Scanner.html

Depending on how complicated your "schema" is, a regular expression might be what you want. If there is a lot of nesting then it might be easiest to convert to XML or JSON and use a prebuilt parser.

People are right about standard formats being best practice, but let's set that aside.

Assuming that the example you give is representative, the task is pretty trivial.

You show a line with an initial token, demarked with a colon-space, then a list of comma-separated values. Separate at that first colon-space, and then use split() on the part to the right. Handling of the quotes is trivial, too.

After looking at your sample input, I fail to see any resemblance to HTML or XML:

-barfoob: boobs, foob, "foo bar"

If this is what you want to parse, I have an alternative suggestion, to use the Java properties parser (comes with standard Java), and then parse the remainder of each line using your own custom code. You will need to refactor your format somewhat in order for this to work, so it's up to you.

barfoob=boobs, foob, "foo bar"

Java properties will be be able to return you barfoob as the property name, and boobs, foob, "foo bar" as the property value. That's where you can use your custom code to split the property value into boobs, foob and foo bar.

I'd strongly advice to not reinvent the wheel and use an existing solution like Flatworm, Fixedformat4j or jFFP that can all parse positional or comma-separated values files (personally, I recommend Flatworm).

You may be able to use the Neko HTML parser to some degree. It depends on how it handles the non-standard HTML.

If the XML is valid, I personally prefer using http://www.xom.nu simply because it features a nice DOM model. As pointed out, though, there are parsers in J2SE.

继续阅读：parsing

Tips for writing a file parser in Java? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？