开发者

UNIX shell-scripting: Split a textfile by its entries

I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:

20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000

I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th). How can I split this file into pieces according to the date?

The number of entries per date differs, so using split with a constant number won't wor开发者_开发知识库k. Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?

Thanks in advance, Alex


Using awk:

awk '{print  > "data"substr($1,0,8)".dat"}' myfile


This should work if the items are in date sequence:

date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
    if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
    then
        echo "$line" >> "$date.dat"
    else
        let date++
    fi
done < log.dat


With the caveats that each day needs to have more than 1 record, and that the output file will have blank lines:

uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'

We really should have an option to uniq to output even uniq records. Also csplit should have an option to suppress the matched line.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜