Concurrent appends to the same file using Perl
I have a need to upgrade a Perl CGI script where the users must complete 3 steps. After they finish each step, the script is logging which step the user completed. Having a record of this is important so we can prove to the user that they only finished step one and didn't complete all three steps, for example.
Right now, the script is creating 1 log file for each instance of the CGI script. So if UserA does step 1, then UserB does step 1, then step 2, then step 3 - and then UserA finishes step 2 and step 3, the order of the log files would be.
LogFile.UserA.Step1
LogFile.UserB.Step1
LogFile.UserB.Step2
LogFile.UserB.Step3
LogFile.UserA.Step2
LogFile.UserA.Step3
The log files are named with the current timestamp, a random number, and the process PID.
This works fine to prevent the same file from getting written to more than once, but the directory quickly gets thousands of files (each file contains just a few bytes in it). There is a process to rotate and compress these logs, but it has fallen upon me to make it so the script logs to just one file a day to reduce the number of log files being created.
Basically, the log file will have the current date in the file name, and anytime the CGI script needs to write to the log, it will append to the one log file for that day, regardless of the user or what step they are on.
Nothing will need to be reading the log file - the only thing that will happen to it is an append by the CGI script. The log rotation will run on log files that are 7 days or older.
My questio开发者_运维技巧n is, what is the best way to handle the concurrent appends to this log file? Do I need to lock it before appending? I found this page on Perl Monks that seems to indicate that "when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten."
I've learned that just because it can be done doesn't mean that I should, but in this case, what is the safest, best practice way to do this?
Summary:
- Concurrent appends to the same file
- Each append to the file is just one line, less than 50 characters
- Order does not matter
Thanks!
Yes, use flock
.
An example program is below, beginning with typical front matter:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :flock /;
Then we specify the path to the log and the number of clients that will run:
my $log = "/tmp/my.log";
my $clients = 10;
To log a message, open the file in append mode so all writes automatically go at the end. Then call flock
to wait our turn on having exclusive access to the log. Once we're up, write the message and close
the handle, which automatically releases the lock.
sub log_step {
my($msg) = @_;
open my $fh, ">>", $log or die "$0 [$$]: open: $!";
flock $fh, LOCK_EX or die "$0 [$$]: flock: $!";
print $fh "$msg\n" or die "$0 [$$]: write: $!";
close $fh or warn "$0 [$$]: close: $!";
}
Now fork
off $clients
child processes to go through all three steps with random intervals between:
my %kids;
my $id = "A";
for (1 .. $clients) {
my $pid = fork;
die "$0: fork: $!" unless defined $pid;
if ($pid) {
++$kids{$pid};
print "$0: forked $pid\n";
}
else {
my $user = "User" . $id;
log_step "$user: Step 1";
sleep rand 3;
log_step "$user: Step 2";
sleep rand 3;
log_step "$user: Step 3";
exit 0;
}
++$id;
}
Don't forget to wait on all the children to exit:
print "$0: reaping children...\n";
while (keys %kids) {
my $pid = waitpid -1, 0;
last if $pid == -1;
warn "$0: unexpected kid $pid" unless $kids{$pid};
delete $kids{$pid};
}
warn "$0: still running: ", join(", " => keys %kids), "\n"
if keys %kids;
print "$0: done!\n", `cat $log`;
Sample output:
[...] ./prog.pl: reaping children... ./prog.pl: done! UserA: Step 1 UserB: Step 1 UserC: Step 1 UserC: Step 2 UserC: Step 3 UserD: Step 1 UserE: Step 1 UserF: Step 1 UserG: Step 1 UserH: Step 1 UserI: Step 1 UserJ: Step 1 UserD: Step 2 UserD: Step 3 UserF: Step 2 UserG: Step 2 UserH: Step 2 UserI: Step 2 UserI: Step 3 UserB: Step 2 UserA: Step 2 UserA: Step 3 UserE: Step 2 UserF: Step 3 UserG: Step 3 UserJ: Step 2 UserJ: Step 3 UserE: Step 3 UserH: Step 3 UserB: Step 3
Keep in mind that the order will be different from run to run.
"when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten" may be true, but that doesn't mean your data can't come out mangled (one entry inside another). It's not very likely to happen for small amounts of data, but it might.
flock
is a reliable and reasonably simple solution to that problem. I would advise you to simply use that.
I would urge Log::Log4Perl
You can try to play with file locking, but this will lead you to the land of hurt very quickly. The easier way would be to have a small persistent process or a cron job that will scan your log file directory and append the events to the log file one a time.
For extra safety you can make your logging scripts create new log file every period of time (say 5 minutes) , and make your daemon ignore the files that are younger than five minutes.
I think that I would run a separate process, e.g. using Net::Daemon or similar, which handles the writing of log entries in a central manner. The CGI script instances would pass the log strings to this daemon through a socket.
You have a few options, in growing order of complexity:
1) Just time-and-datestamp every line. When you need to examine the coalesced file, you interleave all the input files.
2) Write a script that keeps running all the time that holds all filehandles open and, using select() finds files with new data and dumps it to the output in the order it received it. This method could become a resource hog, as it would be constantly calling select, then looking for new files, then opening the new files, then calling select again.
3) Write a script that accepts TCP connections. If you ever end up in a situation where the loggers can have more log files open than a process in your operating system can support at a time, you're back to solution number 1. Honestly, go for number 1.
精彩评论