Bash script for outputting list of specific log files to a text file
I am trying to create a text file that contains a listing of all log files that contain a certain string in the first line. More specifically, SAS log files.
Currently I have a simple script that will search the entire system for "*.log" files and output the entire list to a text file.
Is there a way to only output the log files that contain a certain string?
Here is the current command:
find `pwd` -name "*.log开发者_开发知识库" > sas_log_list.txt
Every SAS log file contains the same string on the very first line.
This string is:1 The SAS System
So basically I want to search a file system for log files containing the string above, and output those file names to a text file.
Thanks in advance, Jason
The hardest part of this question is searching only within the first line. The most accurate one liner (broken here for readability) I could come up with was:
find . -name '*.log' -type f -readable ! -size 0 \
-exec sed -n '1{/The SAS System/q0};q1' {} \; \
-print
Due to the obscure nature of sed
syntax, some explanation is in order:
- The
1{...}
will be evaluated for the first line only. - The
/regex/q0
command will quit with exit code 0 (success) if the regex had been matched (consider/^regex$/
for matching the entire line against that regex). - If we didn't quit due to the previous match the next command
q1
will quit with exit 1 (fail).
find
uses that sed
command as a predicate and -print
only if it was true. However there is a small snag. Apparently if the file is with -size 0
sed
will exit 0
immediately without evaluating its arguments. For that reason we need the ! -size 0
argument to find
.
As suggested by @Brandon Horsley, -type f
will produce less errors, and while we at it lets verify that the file is -readable
as well.
Unless I'm mistaken, you don't need the call to pwd
. I think this will get you what you want. You can use the -l flag on grep to get the filenames rather than the matching lines.
find . -name "*.log" -exec grep -l "The SAS System" {} \; > sas_log_list.txt
find `pwd` -name "*.log" -exec grep "The SAS System" {} \;
or
find \`pwd\` -name "\*.log" | grep -i "the sas system"
bash 4
shopt -s globstar
shopt -s nullglob
for logfile in **/*.log
do
read firstline<"$logfile"
case "$firstline" in
*"The SAS System"*) echo "$logfile" >> sas_log_list.txt
esac
done
I've attempted to make things a bit faster by reading only first line of each file. This prints out file names matching pattern.
( IFS=$'\n' ; for f in $(find `pwd` -name "*log" -type f ) ; do
head -n 1 "$f" | grep -q "The SAS System" && echo "$f"
done )
UPDATE 1: Edited to handle path names containing white space using one of the techniques offered by Charles Duffy. I couldn't use the find -exec .. +
expression as {}
can't appear more than once. Thanks ghostdog74 and Telemachus
UPDATE 2: Add full pathname and last modified time
( IFS=$'\n' ; for f in $(find . -name "*log" -type f ) ; do
head -n 1 "$f" | grep -q "The SAS System" && echo $(readlink -f "$f") $(stat -c %y "$f")
done )
精彩评论