Arrange Log Entries into Dated Files

2022-12-09 13:52 问答作者：

I'm 开发者_StackOverflow中文版trying to split a large log file, containing log entries for months at a time, and I'm trying to split it up into logfiles by date. There are thousands of line as follows:

Sep 4 11:45 kernel: Entry
Sep 5 08:44 syslog: Entry

I'm trying to split it up so that the files, logfile.20090904 and logfile.20090905 contain the entries.

I've created a program to read each line, and send it to the appropriate file, but it runs pretty slow (especially since I have to turn a month name to a number). I've thought about doing a grep for every day, which would require finding the first date in the file, but that seems slow as well.

Is there a more optimal solution? Maybe I'm missing a command line program that would work better.

Here is my current solution:

#! /bin/bash
cat $FILE | while read line; do
  dts="${line:0:6}"
  dt="`date -d "$dts" +'%Y%m%d'`"
  # Note that I could do some caching here of the date, assuming
  # that dates are together.
  echo $line >> $FILE.$dt 2> /dev/null
done

@OP try not to use bash's while read loop to iterate a big file. Its tried and proven that its slow, and furthermore, you are calling external date command for every line of the file you read. Here's a more efficient way, using only gawk

gawk 'BEGIN{
 m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",mth,"|")     
}
{ 
 for(i=1;i<=m;i++){ if ( mth[i]==$1){ month = i } }
 tt="2009 "month" "$2" 00 00 00" 
 date= strftime("%Y%m%d",mktime(tt))
 print $0 > FILENAME"."date
}
' logfile

output

$ more logfile
Sep 4 11:45 kernel: Entry
Sep 5 08:44 syslog: Entry

$ ./shell.sh

$ ls -1 logfile.*
logfile.20090904
logfile.20090905

$ more logfile.20090904
Sep 4 11:45 kernel: Entry

$ more logfile.20090905
Sep 5 08:44 syslog: Entry

The quickest thing given what you've already done would be to simply name the files "Sep 4" and so on, then rename them all at the end - that way all you have to do is read a certain number of characters, no extra processing.

If for some reason you don't want to do that, but you know the dates are in order, you could cache the previous date in both forms, and do a string comparison to find out whether you need to run date again or just use the old cached date.

Finally, if speed really keeps being an issue, you could try perl or python instead of bash. You're not doing anything too crazy here, though (besides starting a subshell and date process every line, which we already figured out how to avoid), so I don't know how much it'll help.

A skeleton of script:

BIG_FILE=big.txt

# remove $BIG_FILE when the script exits
trap "rm -f $BIG_FILE" EXIT

cat $FILES > $BIG_FILE || { echo "cat failed"; exit 1 }

# sort file by date in place
sort -M $BIG_FILE -o $BIG_FILE || { echo "sort failed"; exit 1 }

while read line;
   # extract date part from line ...
   DATE_STR=${line:0:12} 

   # a new date - create a new file
   if (( $DATE_STR != $PREV_DATE_STR)); then 
       # close file descriptor of "dated" file
       exec 5>&- 
       PREV_DATE_STR=$DATE_STR

       # open file of a "dated" file for write
       FILE_NAME= ... set to file name ...
       exec 5>$FILE_NAME || { echo "exec failed"; exit 1 }
   fi

   echo -- $line >&5 || { echo "print failed"; exit 1 }
done < $BIG_FILE

This script executes the inner loop 365 or 366 times, once for each day of the year, instead of iterating over each line of the log file:

#!/bin/bash
month=0
months=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
for eom in 31 29 31 30 31 30 31 31 30 31 30 31
do
    (( month++ ))
    echo "Month $month"
    if (( month == 2 ))    # see what day February ends on
    then
        eom=$(date -d "3/1 - 1 day" +%-d)
    fi
    for (( day=1; day<=eom; day++ ))
    do
        grep "^${months[$month - 1]} $day " dates.log > temp.out
        if [[ -s temp.out ]]
        then
            mv temp.out file.$(date -d $month/$day +"%Y%m%d")
        else
            rm temp.out
        fi
        # instead of creating a temp file and renaming or removing it,
        # you could go ahead and let grep create empty files and let find
        # delete them at the end, so instead of the grep and if/then/else
        # immediately above, do this:
        # grep --color=never "^${months[$month - 1]} $day " dates.log > file.$(date -d $month/$day +"%Y%m%d")
    done
done
# if you let grep create empty files, then do this:
# find -type f -name "file.2009*" -empty -delete

继续阅读：bash syslog

Arrange Log Entries into Dated Files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？