开发者

Perl Text Parsing - fixed delimeted structure is changing

Perl Experts - My attempt to solve my problem is turning into a lot of code, which in PERL seems like I'm approaching this in-correctly. Here is开发者_运维百科 my problem:

I have a block of text (example below) which can have variable amount of whitespace between the column data. I was using a simple split, but the problem now is that the column "code" now contains spaces in the data (I only accounted for that in the last column). What seems to be constant (although I don't have access to, or control of the source structure) is that there is a minimum of 3 spaces between columns (maybe more, but never less).

So, I'd like to say my column delimiter token is "3 spaces" and then trim the data within each to have my actual columnar data.

COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good

I'm just trying to get the values into 6 variables.

NOTE: COL0 may not have a value. COL4 may contain space in data. COL5 may contain no value, or data with space. All fixed formatting is done with spaces (no tabs or other special characters). To clarify -- the columns are NOT consistently sized. One file might have COL4 as 13 characters, another have COL4 with 21 characters wide. Or not strict as another SO member stated.


You'll need to figure out where the columns are. As a really quite disgusting hack, you can read the whole file in and then string-or the lines together:

my @file = <file>;
chomp @file;

my $t = "";
$t |= $_ foreach(@file);

$t will then contain space characters in columns only where there were always space characters in that column; other columns will contain binary junk. Now split it with a zero-width match that matches the non-space:

my @cols = split /(?=[^ ]+)/, $t;

We actually want the widths of the columns to generate an unpack() format:

@cols = map length, @cols;
my $format = join '', map "A$_", @cols;

Now process the file! :

foreach my $line (@file) {
  my($field, $field2, ...) = unpack $format, $line;
  your code here...
}

(This code has only been lightly tested.)


If you're dealing with strict columnar data like this, unpack is probably what you want:

#!perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

my $data = <<EOD;
COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good
EOD

my @lines = split '\n', $data;
for my $line ( @lines ) {
    my @values = unpack("a5 A7 A7 A7 A13 A*", $line);
    print Dumper \@values;
}

This appears to dump out your values into the @values array as you wish, but they'll have leading spaces that you'll have to trim off.


I would use two passes: in the first, find those character columns that have a space in each line; then, split or unpack with those indices. Whitespace trimming is done afterwards.

Your example:

COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good

000011100001110000111000011100000000001110000000000

The 1s in the last line show which columns are all spaces.


I know CanSpice already answered (possibly a much better solution), but you can set the input delimiter using "$/". This must be done in a local scope (probably a sub) as it is a global variable, or you may see side effects. Ex:

local $/ = "   ";
$input = <DATAIN>; # assuming DATAIN is the file-handler

You can trim whitespace using a nice little regex. See Wikipedia for an example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜