perl help replacing commas and embedding values with ctrl characters
I need some help with tweaking my perl script.
I've got an input file with comma separated values like so:
to_em,from_em,flags,updated,marks
xtr133823@xra.co.nz#hv,abc@def.com,16,2007-08-18 16:18:50,33
The first row are the column names to_em from_em flags updated marks
and the following record are the values for each column:
to_em = xtr133823@xra.co.nz#hv
from_em = abc@def.com
flags = 16
updated = 2007-08-18 16:18:50
marks = 33
I am also creating a unique value (MD5), prefixed with "__pkey__
".
Each column name starts with ^E
. Each value starts with ^A
, including the hex value. The record will end with ^D
.
I want the final output file to look like this:
__pkey__^Ad41d8cd98f00b204e9800998ecf8427e^Eto_em^Axtr133823@xra.co.nz#hv^Efrom_em^Aabc@def.com^Eflags^A16^Eupdated^A2007-08-18 16:18:50^Emarks^A33^E^D
But, its coming out like this:
__pkey__^Ad41d8cd98f00b204e9800998ecf8427e^E^Ato_em^E^D__pkey__^A5c09354d0d3d34c96dbad8fa14ff175e^E^Axtr133823@xra.co.nz#hv^E^D
Here's my code:
use strict;
use Digest:开发者_StackOverflow中文版:MD5 qw(md5_hex);
my $data = '';
while (<>) {
my $digest = md5_hex($data);
chomp;
my ($val) = split /,/;
$data = $data. "__pkey__^A$digest^E^A$val^E^D";
}
print $data;
exit;
This seems to work:
use strict;
use Digest::MD5 qw(md5_hex);
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
#my ($sep1, $sep2, $eor) = (chr(1), chr(5), chr(4));
my ($sep1, $sep2, $eor) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (@values) = split /,/;
my $extra = "__pkey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..$#values);
#$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values)-1);
#for my $i (0..$#values)
#{
# $extra .= "$heading[$i]$sep1$values[$i]$sep2";
#}
$data .= "$extra$eor";
}
print $data;
It reads the first line, chomps it, and splits it into fields into the array @heading
.
It reads each subsequent line, chomps it, splits it into fields, runs the digest on it, and then generates the output line.
At the end, it prints all the accumulated data.
If you want actual control characters instead of caret-letter, use the line with chr()
instead of the following one.
If you don't like the all-on-one-line loop, use the commented out one.
Something like this should get you the kind of output you showed
use strict;
use Digest::MD5 qw(md5_hex);
my $data = '';
my $first_line = <>;
chomp($first_line);
my @columns = split(/,/, $first_line);
while (<>) {
chomp;
my (@vals) = split /,/;
my $record = "";
foreach my $column_num (0..$#columns) {
$record .= "^E$columns[$column_num]^A$vals[$column_num]";
}
my $digest = md5_hex($data);
$data = $data. "__pkey__^A$digest$record^D";
}
print $data;
exit;
精彩评论