Perl Text Parsing - fixed delimeted structure is changing
Perl Experts - My attempt to solve my problem is turning into a lot of code, which in PERL seems like I'm approaching this in-correctly. Here is开发者_运维百科 my problem:
I have a block of text (example below) which can have variable amount of whitespace between the column data. I was using a simple split, but the problem now is that the column "code" now contains spaces in the data (I only accounted for that in the last column). What seems to be constant (although I don't have access to, or control of the source structure) is that there is a minimum of 3 spaces between columns (maybe more, but never less).
So, I'd like to say my column delimiter token is "3 spaces" and then trim the data within each to have my actual columnar data.
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
I'm just trying to get the values into 6 variables.
NOTE: COL0 may not have a value. COL4 may contain space in data. COL5 may contain no value, or data with space. All fixed formatting is done with spaces (no tabs or other special characters). To clarify -- the columns are NOT consistently sized. One file might have COL4 as 13 characters, another have COL4 with 21 characters wide. Or not strict as another SO member stated.
You'll need to figure out where the columns are. As a really quite disgusting hack, you can read the whole file in and then string-or the lines together:
my @file = <file>;
chomp @file;
my $t = "";
$t |= $_ foreach(@file);
$t will then contain space characters in columns only where there were always space characters in that column; other columns will contain binary junk. Now split it with a zero-width match that matches the non-space:
my @cols = split /(?=[^ ]+)/, $t;
We actually want the widths of the columns to generate an unpack() format:
@cols = map length, @cols;
my $format = join '', map "A$_", @cols;
Now process the file! :
foreach my $line (@file) {
my($field, $field2, ...) = unpack $format, $line;
your code here...
}
(This code has only been lightly tested.)
If you're dealing with strict columnar data like this, unpack
is probably what you want:
#!perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my $data = <<EOD;
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
EOD
my @lines = split '\n', $data;
for my $line ( @lines ) {
my @values = unpack("a5 A7 A7 A7 A13 A*", $line);
print Dumper \@values;
}
This appears to dump out your values into the @values
array as you wish, but they'll have leading spaces that you'll have to trim off.
I would use two passes: in the first, find those character columns that have a space in each line; then, split or unpack with those indices. Whitespace trimming is done afterwards.
Your example:
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
000011100001110000111000011100000000001110000000000
The 1
s in the last line show which columns are all spaces.
I know CanSpice already answered (possibly a much better solution), but you can set the input delimiter using "$/". This must be done in a local scope (probably a sub) as it is a global variable, or you may see side effects. Ex:
local $/ = " ";
$input = <DATAIN>; # assuming DATAIN is the file-handler
You can trim whitespace using a nice little regex. See Wikipedia for an example.
精彩评论