开发者

Comparison of two Java classes

I have two java classes that are very similar in semantics but differ in syntax. The differences are minor, like -

Changes in variable names,

Changes in position of some statements (with 开发者_JAVA百科no dependent lines in between),

Extra imports, etc.

I need to compare these two classes to prove that they are indeed semantically identical. The same needs to be done for a large number of java file pairs.

The first approach of reading from the two files and comparing the lines, with logic to deal with the differences mentioned above seems inefficient. Is there some other way that I can achieve this task? Any helpful APIs out there?


Compile both of the classes without debug information and then decompile them back to source files. The decompiled files should be a lot more similar than the original source files.

You can improve this further by running some optimizations on the compiled files. For example you can use Proguard with just shrinking enabled to removed unused code.

Changes in position of some statements can be hard to detect though.


If you want to examine the changes in the code try Araxis Merge or WinMerge.

But if you want logical differences, I am afraid you might have to do it manually.

I would advise to use one of these tools to look for textual changes and then look for logical differences.


There are a lot of similarity checker out there, and until now there's no yet perfect tool for this. Each has its own advantages / disadvantages. The approaches generally falls into two categories: token-based or tree-based.

Token-based similarity checking is usually done with regular expressions, but other approaches are possible. In one of my projects at university, we developed one utilizing alignment strategy from bioinformatics field. The disadvantage of this technique is mainly if the size of the two sources isn't more or less equal.

Tree-based is more like a compiler, so normally using some compilation techniques it's possible (well, more or less) to check for this. Tree-based approach has disadvantages of being exponential in comparison complexity.


Comparing line by line wont work. I think you may need to use a parser. I would suggest that you take a look at ANTLR. It should have a java grammar where you could put your actions which will do the comparison.


As far as I know there's now way to compare the semantics of two Java classes. Take for example the following two methods:

public String m1(String a, int b) { ... }

and

public String m2(String x, int y) { ... }

A part from changes in variables and methods names, their signature is the same: same return type, and same input types. However, this is no guarantee that the two methods are semantically equivalent. For example, m1 could return a string consisting of the first b characters of a, while m2 could return a string consisting of y repetitions of x. As you can see, although only variables and names change, the semantics of the two methods is totally different.

I don't see an easy way out for your problem. You can perhaps make some assumption and try the following approach:

  • assume that the methods names in the two classes are the same
  • write test cases (for example with JUnit) for all the methods in the first class
  • run the test cases on the second class
  • ensure that the second class does not have other (untested) methods (for example using reflection)

This approach gives you an idea about equivalent semantics, but it makes strong assumption.

As a final remark, let me add that specifying the semantics of programs is an interesting and open research topic. Some interesting development in this area include research on Semantic Web Services. A widely adopted approach to give machine processable semantics to programs is that of specifying their IOPE: Input and Output types (as int the Java methods above), and their Preconditions and Effects. Preconditions are essentially logical conditions that must hold true for successfully invoking the program, and Effects are formal descriptions of the changes (in the state of the world) caused by the successful execution of the program. Even with IOPE there are a lot of problems ... which I skip in this short description.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜