开发者

In Perl, how can I correctly parse tab/space delimited files with quoted strings?

I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.

When I try to parse them with the split function it splits these strings as well. Now how can I make开发者_Python百科 perl understand that the strings within the " " are a single column entry?

A simple example is,

12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "


Use the Text::CSV library, which handles all the edge cases for you. It lets you set the delimiter:

my $csv = Text::CSV->new({sep_char => "\t"});


Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:

#!/usr/bin/perl

use Text::ParseWords qw( quotewords );
use YAML;

while ( my $line = <DATA> ) {
    print Dump [ quotewords('\s+', 0, $line) ];
}

__DATA__
12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

Output:

---
- 12
- 345546.67677
- Hello World!!!
- -567.55656
- 0.5465767
- 'Hello_Again;   '


Other possibilities are Regexp::Common::balanced and Text::Balanced.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜