开发者

Parsing Java Source Code

I am asked to develop a software which should be able to create Flow chart/ Control Flow of the input Java source code. So I started researching on it and arrived at following solutions:

To create flow chart/control flow I have to recognize controlling statements and function calls made in the given source code Now I have two ways of recognizing:

  1. Parse the Source code by writing my own grammars (A complex solution I think). I am thinking to use Antlr for this.
  2. Read input source code files as text and search for the specific patterns (May become inefficient)

Am I right here? Or I am missing something very fundamental and simple? Which approach would take less time and do the work efficiently? Any other suggestions in this regard will be welcome too. Any other efficient approach would help because the input source code may span multiple files and can be fairly complex.

I am good in .NET languages but this is my first big project in Java. I have basic knowledge of Compiler Design so writing grammars should not be impossible for me.

Sorry开发者_高级运维 If I am being unclear. Please ask for any clarifications.


I'd go with Antlr and use an existing Java grammar: https://github.com/antlr/grammars-v4


All tools handling Java code usually decide first whether they want to process the language Java or Java byte code files. That is a strategic decision and depends on your use case. I could image both for flow chart generation. When you have decided that question. There are already several frameworks or libraries, which could help you on that. For byte code engineering there are: ASM, JavaAssist, Soot, and BCEL, which seems to be dead. For Java language parsing and analyzing, there are: Polyglot, the eclipse compiler, and javac. All of these include a complete compiler frontend for Java and are open source.

I would try to avoid writing my own parser for Java. I did that once. Java has a rather complex grammar, but which can be found elsewhere. The real work begins with name and type resolution. And you would need both, if you want to generate graphs which cover more than one method body.


Eclipse has a library for parsing the source code and creating Abstract Syntax Tree from it which would let you extract what you want.

See here for a tutorial http://www.vogella.de/articles/EclipseJDT/article.html

See here for api http://help.eclipse.org/indigo/topic/org.eclipse.jdt.doc.isv/reference/api/org/eclipse/jdt/core/dom/package-summary.html#package_description


Now I have two ways of recognizing:

You have many more ways than that. JavaCC ships with a Java 1.5 grammar already built. I'm sure other parser generators ditto. There is no reason for you to either have to write your own grammar or construct your own parser.

And specifically 'read[ing] input source code files as text and search for the specific patterns' isn't a viable choice at all, as it isn't parsing, and therefore cannot possibly recognize Java programs correctly.


Your input files are written in Java, and the software should be written in Java, but this is your first project in Java? First of all, I'd suggest learning the language with smaller projects. Also you need to learn how to use graphics in Java (there are various libraries). Then, you should focus on what you want to show on your graphs. Or is text sufficient?


The way I would do it is to analyse compiled code. This would allow you to read jars without source and avoid parsing the code yourself. I would use Objectwebs ASM to read the class files.


Smarter solution is to use Eclipse's java parser. Read more here: http://www.ibm.com/developerworks/opensource/library/os-ast/


Or even more easy: Use reflection. You should be able to compile the sources, load the classes with java classloader and analyse them from there. I think this is far more easy than any parsing.


Our DMS Software Reengineering Toolkit is general purpose program analysis and transformation machinery, with built in capability for parsing, building ASTs, constructing symbol tables, extracting control and data flow, transforming the ASTs, prettyprinting ASTs back to text, etc.

DMS is parameterized by an explicit language definition, and has a large set of preexisting definitions.

DMS's Java Front End already computes control and data flow graphs, so your problem would be reduced to exporting them.

EDIT 7/19/2014: Now handles Java 8.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜