Perl Text Parsing - fixed delimeted structure is changing

2023-02-05 15:23 问答作者：

Perl Experts - My attempt to solve my problem is turning into a lot of code, which in PERL seems like I'm approaching this in-correctly. Here is开发者_运维百科 my problem:

I have a block of text (example below) which can have variable amount of whitespace between the column data. I was using a simple split, but the problem now is that the column "code" now contains spaces in the data (I only accounted for that in the last column). What seems to be constant (although I don't have access to, or control of the source structure) is that there is a minimum of 3 spaces between columns (maybe more, but never less).

So, I'd like to say my column delimiter token is "3 spaces" and then trim the data within each to have my actual columnar data.

COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good

I'm just trying to get the values into 6 variables.

NOTE: COL0 may not have a value. COL4 may contain space in data. COL5 may contain no value, or data with space. All fixed formatting is done with spaces (no tabs or other special characters). To clarify -- the columns are NOT consistently sized. One file might have COL4 as 13 characters, another have COL4 with 21 characters wide. Or not strict as another SO member stated.

You'll need to figure out where the columns are. As a really quite disgusting hack, you can read the whole file in and then string-or the lines together:

my @file = <file>;
chomp @file;

my $t = "";
$t |= $_ foreach(@file);

$t will then contain space characters in columns only where there were always space characters in that column; other columns will contain binary junk. Now split it with a zero-width match that matches the non-space:

my @cols = split /(?=[^ ]+)/, $t;

We actually want the widths of the columns to generate an unpack() format:

@cols = map length, @cols;
my $format = join '', map "A$_", @cols;

Now process the file! :

foreach my $line (@file) {
  my($field, $field2, ...) = unpack $format, $line;
  your code here...
}

(This code has only been lightly tested.)

If you're dealing with strict columnar data like this, unpack is probably what you want:

#!perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

my $data = <<EOD;
COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good
EOD

my @lines = split '\n', $data;
for my $line ( @lines ) {
    my @values = unpack("a5 A7 A7 A7 A13 A*", $line);
    print Dumper \@values;
}

This appears to dump out your values into the @values array as you wish, but they'll have leading spaces that you'll have to trim off.

I would use two passes: in the first, find those character columns that have a space in each line; then, split or unpack with those indices. Whitespace trimming is done afterwards.

Your example:

COL0   COL1   COL2   COL3         COL4   COL5
   -      4    0.2      1       416489   463455 554
          1    0.9      1           E1   
   0      3    1.4     14   E97-TEST 1   
   -      1   97.5    396         PASS   Good

000011100001110000111000011100000000001110000000000

The 1s in the last line show which columns are all spaces.

I know CanSpice already answered (possibly a much better solution), but you can set the input delimiter using "$/". This must be done in a local scope (probably a sub) as it is a global variable, or you may see side effects. Ex:

local $/ = "   ";
$input = <DATAIN>; # assuming DATAIN is the file-handler

You can trim whitespace using a nice little regex. See Wikipedia for an example.

继续阅读：perl

Perl Text Parsing - fixed delimeted structure is changing

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？