help merging perl code routines together for file processing
I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
- read a directory
- look for a particular file
- use the file name to strip out some key information to create a newly processed file
- process the input file
- create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
开发者_StackOverflow }
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (@values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (@)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
my @attrs = extract_file_info($ifile);
process_file($ifile, @attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub
that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = @_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (@values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(@values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c
reports that this code is syntactically valid, but it is otherwise untested.
精彩评论