
Java Grammar syntax analyzer (ASCII to graph)

I am developing an assistant to type database commands for DBAs, because these commands have many parameters, and an assistant will help a lot with their job. For this assistant, I need the grammar of the commands, but database vendors (Oracle, DB2) do not provide that information in any format, the only thing is the documentation.

One example of a DB2 command is: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001933.html

For this reason, I am trying to analyze the grammar diagram or railroad diagram (http://en.wikipedia.org/wiki/Syntax_diagram), but I have not found anything in Java that could help me. I would like some re-engineering (reverse) tool that takes the ASCII (textual representation) of the grammar, and creates a graph in Java. Then, with the graph in Java, the assistant could propose options of the current typed command.

On example of the assistant http://www.youtube.com/watch?v=5sBoUHJupvs


If you have information about how to analyze grammar diagrams with Java (not generate) I will appreciate that information.

The closest tool I've seen is Grammar Recovery System by Ralf Lammel. It depends on accessibility of railroad diagrams as text strings. That's generally not how they are found. You appear to be lucky in the DB2 case, Ralf's work points in the right direction.

Considering that such diagrams are usally rendered as just a set of pixels (PLSQL's are like this in the PDF files provided for documentation), you have several sets of problems: recognizing graphical entities from pixels, assembling them into actual representations of the railroad diagrams, and then using such as your assistant.

I think this is a long, hard, impractical approach. If you got it to work, you'd discover the diagrams are slightly wrong in many places (read Ralf's paper or find out the hard way), and therefore unusable for a tool that is supposed to produce the "right" stuff to help your DBAs.

Of course, you are objecting to the other long, hard, "impractical" approach of reading the documentation and producing grammars that match, and then validating those grammars against the real world. Yes, this is a tough slog too, but it actually does produce useful results. You need to find vendors that done this and will make it available to you.

ANTLR.org offers a variety of grammars. Have you checked there?

My company offers grammars and tools for processing them. We have done this for PLSQL and SQL2011 but not yet DB2.

Given a grammar, you now need to use it to provide "advice" to your users. Your users aren't going to type in a complete "program"; they want to generate fragments (e.g., SELECT statements). Now you need a parser that will process grammar fragments and at least say "legal" or "not". Most won't do that. Our DMS Software Reengineering Toolkit will do that.

To provide advice, you need to be able to walk the grammar (much as you considered for railroad diagrams) to compute "what is legal next". That's actually pretty hard (an in fact it is roughly equivalent to what an LR/GLR parser generator does when building tables). Our DMS engine does that during syntax error repair by traversing its GLR parse tables (since that work is already encoded in the tables!). That's not easy to do, as it is a peculiar variant of the GLR parsing algorithm. You might do better with an Earley parser, which keeps around all possible parses as a set of choices; you could simply inspect each one.

But this looks like quite a lot of work, and I think you'll be surprised by the amount of machinery you need.

The best work in this area is Harmonia, which produces incremental editors for code. Our DMS engine's parser is based on earlier work done by this project, because we are interested in the incrementality aspect.

You can try using ANTLR http://www.antlr.org/ It will not be able to understand an ASCII representation of the grammar, but it is powerful enough to do anything else you need, if you don't mind spending the time to learn the software.





验证码 换一张
取 消

