In Perl, how can I correctly parse tab/space delimited files with quoted strings?
I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.
When I try to parse them with the split function it splits these strings as well. Now how can I make开发者_Python百科 perl understand that the strings within the " " are a single column entry?
A simple example is,
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
Use the Text::CSV
library, which handles all the edge cases for you. It lets you set the delimiter:
my $csv = Text::CSV->new({sep_char => "\t"});
Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:
#!/usr/bin/perl
use Text::ParseWords qw( quotewords );
use YAML;
while ( my $line = <DATA> ) {
print Dump [ quotewords('\s+', 0, $line) ];
}
__DATA__
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
Output:
--- - 12 - 345546.67677 - Hello World!!! - -567.55656 - 0.5465767 - 'Hello_Again; '
Other possibilities are Regexp::Common::balanced and Text::Balanced.
精彩评论