awk to perl conversion
I have a directory full of files containing records like:
FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx
01/26/2011
These items are being held for you at the location shown below each one.
IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.
The Waltons. The complete DAXXXX12118198
Pickup at:CHUPACABRA LOCATION 02/02/2011
GRIMLY, WILFORD
29 FAKE LANE
S. BLEMPGLORFF RI 99XXX
I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION
.
The "record separator" issue: I can't touch the input file's formatting -- it must be retained as is. Each record is separated by roughly 40+ new lines.
Here's some awk ( this works ):
BEGIN {
RS="\n\n\n\n\n\n\n\n\n+"
FS="\n"
}
!/CHUPACABRA/{print $0}
My stab with perl:
perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";开发者_StackOverflowchomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000
Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob! Here's a ( working ) perl script version ( adjusted a2p output ):
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n"x10;
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
In Perl, the record separator is a literal string, not a regular expression. As the perlvar
doc famously says:
Remember: the value of
$/
is a string, not a regex. awk has to be better for something. :-)
Still, it looks like you can get away with $/="\n" x 10
or something like that:
perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
print if /\S/ && !m/$regex/i;' data/lib51.000
Note the extra /\S/ &&
, which will skip empty paragraphs from input that has more than 20 consecutive newlines.
Also, have you considered just installing Cygwin and having awk
available on your Windows machine?
There is no need for (much)conversion if you can download gawk for windows
Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?
And, if you have Perl on your machine, the documentation for this program is already there:
C> perldoc a2p
My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.
精彩评论