visualizing document similarity points [closed]
We are currently doing a project on plagiarism detection of two text document. We have to compare two submitted documents and present the comparison results. For that I want to present the two documents side by side and highlight the similarity points between the documents in a GUI. I used various algorithms t开发者_如何学Pythono get the similarity score between two documents such as vector space and shingle cloud algorithms. but they dont provide the sections that have the similarity and I have to present them to the user where the similarity occurs in a graphical interface.
Thanx Nuwan
Should it really be graphical? You're comparing text. It seems like you'd want to stick with a textual interface. However, you could create something pretty quickly with Swing. I'd probably start by printing out the shingles that the documents share in common along with some context. I also tried searching for some off-the-shelf diff engine you could use but came up short. Maybe you could actually somehow shell out to or somehow incorporate the Unix diff
tool into your application?
精彩评论