help merging perl code routines together for file processing

2023-02-11 06:06 问答作者：

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.

Also, any best practices would be great too as I'm learning this language. Thanks for your help.

Here's the process flow I am looking for:

read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)

Part 1:

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if ($file =~ /^\.+$/);
    #Get filename attributes
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
      print "$1\n";
      print "$2\n";
      print "$3\n";
 开发者_StackOverflow   }
    print "$file\n";
}

Part 2:

use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
    my $digest = md5_hex($data);
    chomp;
    my (@values) = split /,/;
    my $extra = "__mykey__$sep1$digest$sep2" ;
    $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
    $data .= "$extra$eorec"; 
    print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:

use Modern::Perl;
# use... 

sub get_input_files {
  # return an array of files (@)
}

sub extract_file_info {
   # takes the file name and returs an array of values (filename attrs)
}

sub process_file {
   # reads the input file, takes the previous attribs and build the output file
}


my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
   my @attrs = extract_file_info($ifile);
   process_file($ifile, @attrs);
}

Hope it helps

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:

#!/usr/bin/env perl

# - Never forget these!
use strict;
use warnings;

use Digest::MD5 qw(md5_hex);

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    # Parens on postfix "if" are optional; I prefer to omit them
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
        process_file($file, $1, $2, $3);
    }
    print "$file\n";
}

sub process_file {
    my ($orig_name, $foo_x, $name_x, $p_x) = @_;

    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";

    # - From your description of the task, it sounds like we actually want to
    #   read from the found file, not from <>, so opening it here to read
    # - Better to use lexical ("my") filehandle and three-arg form of open
    # - "or" has lower operator precedence than "||", so less chance of
    #   things being grouped in the wrong order (though either works here)
    # - Including $! in the error will tell why the file open failed
    open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        # - Useless use of double quotes removed on next two lines
        $data .= $extra . $eorec;
        #print $out_fh $data;
    }
    # - Moved print to output file to here (where it will print the complete
    #   output all at once) rather than within the loop (where it will print
    #   all previous lines each time a new line is read in) to prevent
    #   duplicate output records.  This could also be achieved by printing
    #   $extra inside the loop.  Printing $data at the end will be slightly
    #   faster, but requires more memory; printing $extra within the loop and
    #   getting rid of $data entirely would require less memory, so that may
    #   be the better option if you find yourself needing to read huge input
    #   files.
    print $out_fh $data;

    # - $in_fh and $out_fh will be closed automatically when it goes out of
    #   scope at the end of the block/sub, so there's no real point to
    #   explicitly closing it unless you're going to check whether the close
    #   succeeded or failed (which can happen in odd cases usually involving
    #   full or failing disks when writing; I'm not aware of any way that
    #   closing a file open for reading can fail, so that's just being left
    #   implicit)
    close $out_fh or die "Failed to close file: $!";
}

Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.

继续阅读：fileparsing foreach perl while-loop

help merging perl code routines together for file processing

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？