How to search for lines in a file between two timestamps using Perl?
In Perl I am trying to read a log file and will print only the lines that have a timestamp between two specific times. The time format is hh:mm:ss and this is always the third value on each log. For example, I would be searching for lines that would fall between 12:52:33 to 12:59:33
I am new to Perl and have no idea which route to ta开发者_如何学运维ke to even begin to program this. I am pretty sure this would use some type of regex, but for the life of me I cannot even begin to fathom what that would be. Could someone please assist me with this.
Also, to make this more difficult I have to do this with the core Perl modules because my company will not allow me to use any other modules until they have been tested and verified there will be no ill effects on any of the systems the script may interact with.
In pseudocode, you'd do something like this:
- read in the file line by line:
- parse the timestamp for this line.
- if it's less than the start time, skip to the next line.
- if it's greater than the end time, skip to the next line!
- else: this is a line you want: print it out.
This may be too advanced for your needs, but the flip-flop operator ..
immediately comes to mind as something that would be useful here.
For reading in a file from stdin, this is the conventional pattern:
while (my $line = <>)
{
# do stuff...
}
Parsing a line into fields can be done easily with split
(see perldoc -f split). You will probably need to split the line by tabs or spaces, depending on the format.
Once you've got the particular field (containing the timestamp), you can examine it using a customized regexp. Read about those at perldoc perlre.
Here's something which might get you closer:
use strict;
use warnings;
use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);
while (my $line = <>)
{
# split into fields using whitespace as the delimiter
my @fields = split(/\s+/, $line);
# the timestamp is the 3rd field
my $timestamp = $fields[2];
my ($hour, $min, $sec) = split(':', $timestamp);
my $time = mktime($sec, $min, $hour);
next unless ($time < $starttime) .. ($time > $endtime);
print $line;
}
If the start and end times are known, a Perl one-liner with a flip-flop operator is what you need:
perl -ne 'print if /12:52:33/../12:59:33/' logFile
If there is some underlying logic needed in order for you to determine the start and end times, then 'unroll' the one-liner to a formal script:
use strict;
use warnings;
open my $log, '<', 'logFile';
my $startTime = get_start_time(); # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time(); # Sets $endTime in hh:mm:ss format
while ( <$log> ) {
print if /$startTime/../$endTime/;
}
As noted by Ether's comment, this will fail if the exact time is not present. If this is a possibility, one might implement the following logic instead:
use strict;
use warnings;
use autosplit;
open my $log, '<', 'logFile';
my $startTime = get_start_time(); # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time(); # Sets $endTime in hh:mm:ss format
while ( <$log> ) {
my $time = (split /,/, $_)[2]; # Assuming fields are comma-separated
# and timelog is 3rd field
last if $time gt $endTime; # Stop when stop time reached
print if $time ge $startTime;
}
If each line in the file has the time stamp, then in 'sed' you could write:
sed -n '/12:52:33/,/12:59:33/p' logfile
This will echo the relevant lines.
There is a Perl program, s2p, that will convert 'sed' scripts to Perl.
The basic Perl structure is along the lines of:
my $atfirst = 0;
my $atend = 0;
while (<>)
{
last if $atend;
$atfirst = 1 if m/12:52:33/;
$atend = 1 if m/12:59:33/;
if ($atfirst)
{
process line as required
}
}
Note that as written, the code will process the first line that matches the end marker. If you don't want that, move the 'last' after the test.
If your log files are segregated by day, you could convert the timestamps to seconds and compare those. (If not, use the technique from my answer to a question you asked earlier.)
Say your log is
12:52:32 outside 12:52:43 strictly inside 12:59:33 end 12:59:34 outside
Then with
#! /usr/bin/perl
use warnings;
use strict;
my $LOGPATH = "/tmp/foo.log";
sub usage { "Usage: $0 start-time end-time\n" }
sub to_seconds {
my($h,$m,$s) = split /:/, $_[0];
$h * 60 * 60 +
$m * 60 +
$s;
}
die usage unless @ARGV == 2;
my($start,$end) = map to_seconds($_), @ARGV;
open my $log, "<", $LOGPATH or die "$0: open $LOGPATH: $!";
while (<$log>) {
if (/^(\d+:\d+:\d+)\s+/) {
my $time = to_seconds $1;
print if $time >= $start && $time <= $end;
}
else {
warn "$0: $LOGPATH:$.: no timestamp!\n";
}
}
you'd get the following output:
$ ./between 12:52:33 12:59:33 12:52:43 strictly inside 12:59:33 end
精彩评论