How can I find lines in one file but not the other using bash scripting?
Imagine file 1:
#include "first.h"
#include "second.h"
#include "third.h"
// more code here
...
开发者_运维技巧
Imagine file 2:
#include "fifth.h"
#include "second.h"
#include "eigth.h"
// more code here
...
I want to get the headers that are included in file 2, but not in file 1, only those lines. So, when ran, a diff of file 1 and file 2 will produce:
#include "fifth.h"
#include "eigth.h"
I know how to do it in Perl/Python/Ruby, but I'd like to accomplish this without using a different programming language.
This is a one-liner, but does not preserve the order:
comm -13 <(grep '#include' file1 | sort) <(grep '#include' file2 | sort)
If you need to preserve the order:
awk '
!/#include/ {next}
FILENAME == ARGV[1] {include[$2]=1; next}
!($2 in include)
' file1 file2
If it's ok to use a temp file, try this:
grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep include
This
- extracts all includes from
file1.h
and writes them to the file/tmp/x
- uses this file to get all lines from
file2.h
that are not contained in this list - extracts all includes from the remainder of
file2.h
It probably doesn't handle differences in whitespace correctly etc, though.
EDIT: to prevent false positives, use a different pattern for the last grep (thanks to jw013 for mentioning this):
grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep "^#include"
This variant requires an fgrep
with the -f
option. GNU grep (i.e. any Linux system, and then some) should work fine.
# Find occurrences of '#include' in file1.h
fgrep '#include' file1.h |
# Remove any identical lines from file2.h
fgrep -vxf - file2.h |
# Result is all lines not present in file1.h. Out of those, extract #includes
fgrep '#include'
This does not require any sorting, nor any explicit temporary files. In theory, fgrep -f
could use a temporary file behind the scenes, but I believe GNU fgrep
doesn't.
If the goal need not be accomplished with Bash alone (i.e., use of external programs is acceptable), then use combine
from moreutils:
combine file1 not file2 > lines_in_file1_not_in_file2
cat $file1 $file2 | grep '#include' | sort | uniq -u
精彩评论