开发者

How can I perform a diff that ignores all comments?

I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.

Is there any开发者_开发技巧 other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //, /* */ and #.


To use visual diff, you can try Meld or DiffMerge.

DiffMerge

Its rulesets and options provide for customized behavior.

GNU diffutils

From the command-line perspective, you can use --ignore-matching-lines=RE option for diff, for example:

diff -d -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.


See also:

  • How to diff files ignoring comments (lines starting with #)?

Alternatively, check other diff apps, for example:

  • for macOS: Code compare and merge tools
  • for Windows: 3-way merge tools for Windows


You can filter both files through stripcmt first which will remove C and C++ comments. For removing # comments, sed 's/#.*//' will remove those.

Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):

  1. If the latest version of the original code base is A and the latest of the copied code base is B, let's call the versions with comments removed for A' and B' (e.g. save those to temporarily files while processing).
  2. Find some common origin version and strip comments from that into O' (alternatively just re-use B' for this).
  3. Perform a 3-way merge of O', A' and B' and save to C'. KDiff3 is an excellent tool for this.
  4. Now you have the code changes you want merged, however C' is without comments, so get back into "normal" mode, do a new 3-way merge with A' as base and A and C'. This will pick up the changes between A' and C' (which is the code changes what you want) into the normal code base with comments based on version A.

Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.


diff <file1> <file2> | grep -v '^[<>]\ #'

Far from perfect but it will give an idea of the differences


See our Smart Differencer line of tools, which compare computer language source files using the language structure rather than the layout as a guide. This in particular means it ignores comments and whitespace in comparing code.

There is a SmartDifferencer for PHP.


gnu diff supports ignoring lines wich match a regular expression:

diff --ignore-matching-lines='^#' file1 file2

and for folders:

diff -[bB]qr --ignore-matching-lines='^#' folder1/ folder2/

This would ignore all lines which start with a # at the line beginning.


I tried: diff file1 file2 and diff -d -I ^#.\* file1 file2 and the result was the same in both cases - included comments;

however, diff -u file1 file2 | grep -v '^ \|^.#\|^.$' gives what I need: real diffs only, no comments, no empty lines. ;)


Try:

diff -I REGEXP -I REGEXP2 file1 file 2

See: Regular expression at Wikipedia

Below are examples of regular expressions that would cause a diff to ignore a preprocessor directive and both standard comment block types.

In example:

\#*\n
/***/
//*\n
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜