
Eliminating code duplication in a single file

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, ev开发者_JS百科en within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.

Thanks in advance.


Thanks for all the great tools! I'll definitely check them out.

This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.

Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.


If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.

You don't say what language you are using, which is going to affect what tools you can use.

For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.

See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.

The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.

Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you

Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.

One with some Office skills can do following sequence in 1 minute:

  • use ordinary formatter to unify the code style, preferably without line wrapping
  • feed the code text into Microsoft Excel as a single column
  • search and replace all dual spaces with single one and do other replacements
  • sort column

At this point the keywords for duplicates will be already well detected. But to go further

  • add comparator formula to 2nd column and counter to 3rd
  • copy and paste values again, sort and see the most repetitive lines

There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.

Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd





验证码 换一张
取 消

