how to truncate a string using regular expression in perl
I have the following string in a file and want to truncate the string to no more than 6 char. how to do that using regular expression in perl?
the original file is:cat shortstring.in:
<value>1234@google.com</value>
<value>1235@google.com</value>
I want to get file as:
cat shortstring.out<value>1234@g</value>
<value>1235@g</value>
s/<value>(\w\w\w\w\w\w)(.*)/$1/;
?
Here is a part of my code:
while (<$input_handle>) { # take one input line at a time
chomp;
if (/(\d+@google.com)/) {
s/(<value>\w\w\w\w\w\w)(.*)</value>/$1/;
print $output_handle "$_\n";
} else {
print $output_handle "$_\n";
}
}
Use this instead (regex is not the only feature of Perl and it's overkill for this: :-)
$str = substr($str, 0, 6);
http://perldoc.perl.org/functions/substr.html
$ perl -pe 's/(<value>[^<]{1,6})[^<]*/$1/' shortstring.in <value>1234@g</value> <value>1235@g</value>
In the context of the snippet from your question, use
while (<$input_handle>) {
s!(<value>)(.*?)(</value>)!$1 . substr($2,0,6) . $3!e
if /(\d+\@google\.com)/;
print $output_handle $_;
}
or to do it with a single pattern
while (<$input_handle>) {
s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
print $output_handle $_;
}
Using bangs as the delimiters on the substitution operator prevents Leaning Toothpick Syndrome in </value>
.
NOTE: The usual warnings about “parsing” XML with regular expressions apply.
Demo program:
#! /usr/bin/perl
use warnings;
use strict;
my $input_handle = \*DATA;
open my $output_handle, ">&=", \*STDOUT or die "$0: open: $!";
while (<$input_handle>) {
s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
print $output_handle $_;
}
__DATA__
<value>1234@google.com</value>
<value>1235@google.com</value>
<value>12@google.com</value>
Output:
$ ./prog.pl <value>1234@g</value> <value>1235@g</value> <value>12@goo</value>
Try this:
s|(?<=<value>)(.*?)(?=</value>)|substr $1,0,6|e;
Looks like you want to truncate the text inside the tag which could be shorter than 6 characters already, in which case:
s/(<value>[^<]{1,6})[^<]*/$1/
s/<value>(.{1,6}).*/<value>$1</value>/;
精彩评论