开发者

Eliminating code duplication in a single file

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, ev开发者_JS百科en within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.

Thanks in advance.

Edit:

Thanks for all the great tools! I'll definitely check them out.

This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.


Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.

http://www.getatomiq.com/


If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.


You don't say what language you are using, which is going to affect what tools you can use.

For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.


See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.

The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.


Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you


Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.


One with some Office skills can do following sequence in 1 minute:

  • use ordinary formatter to unify the code style, preferably without line wrapping
  • feed the code text into Microsoft Excel as a single column
  • search and replace all dual spaces with single one and do other replacements
  • sort column

At this point the keywords for duplicates will be already well detected. But to go further

  • add comparator formula to 2nd column and counter to 3rd
  • copy and paste values again, sort and see the most repetitive lines


There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.


Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜