Splitting a String into Tokens and Storing the Delimiters in Perl
I have a string like this:
a b c d
I process my string like this:
chomp $line;
my @tokens = split /\s+/, $line;
my @new_tokens;
foreach my $token (@tokens) {
push @new_tokens, some_complex_function( $token );
}
my $new_str = join ' ', @tokens;
I'd like to re-join the string with the original whitespace. Is there some way that I can store the whitespace from split and re-use it later? Or is this going to be a hug开发者_JAVA百科e pain? It's mostly cosmetic, but I'd like to preserve the original spaces from the input string.
If you split with a regex with capturing parentheses, the split pattern will be included in the resulting list (see perldoc -f split):
my @list = split /(\s+)/, 'a b c d';
print Data::Dumper::Dumper(\@list);
VAR1 = [
'a',
' ',
'b',
' ',
'c',
' ',
'd'
];
Just split on word boundaries:
split /\b/, $line;
For your example, this will give:
('a',' ','b',' ','c',' ','d')
EDIT: As brian d foy pointed out, \b
uses the wrong character classes, Following my original idea, I came up with using look-around assertions. This looks way more complicated than Ether's answer, though:
split /(?:(?<=\S)(?=\s)|(?<=\s)(?=\S))/, $line;
Why don't you simply do: my $new_str = uc( $line );
?
UPDATE - original uc() is just a shorthand for "more complex function".
Well, generally you can also:
$line =~ s/(\S+)/more_complex_function($1)/ge;
精彩评论