开发者

awk to perl conversion

I have a directory full of files containing records like:

FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx


                                                                      01/26/2011
     These items are being held for you at the location shown below each one.
     IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.

     The Waltons. The complete  DAXXXX12118198
     Pickup at:CHUPACABRA LOCATION                                 02/02/2011







                                                  GRIMLY, WILFORD
                                                  29 FAKE LANE
                                                  S. BLEMPGLORFF RI  99XXX

I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION.

The "record separator" issue: I can't touch the input file's formatting -- it must be retained as is. Each record is separated by roughly 40+ new lines.

Here's some awk ( this works ):

BEGIN { 
    RS="\n\n\n\n\n\n\n\n\n+" 
    FS="\n"
}
!/CHUPACABRA/{print $0}

My stab with perl:

perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";开发者_StackOverflowchomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000

Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
   }   
}

This has to run under someone's Windows box otherwise I'd stick with awk.

Thanks!

Bubnoff

EDIT ( SOLVED ) **

Thanks mob! Here's a ( working ) perl script version ( adjusted a2p output ):

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n"x10;
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
    }   
}

Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!


In Perl, the record separator is a literal string, not a regular expression. As the perlvar doc famously says:

Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)

Still, it looks like you can get away with $/="\n" x 10 or something like that:

perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
       print if /\S/ && !m/$regex/i;' data/lib51.000

Note the extra /\S/ &&, which will skip empty paragraphs from input that has more than 20 consecutive newlines.

Also, have you considered just installing Cygwin and having awk available on your Windows machine?


There is no need for (much)conversion if you can download gawk for windows


Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?

And, if you have Perl on your machine, the documentation for this program is already there:

C> perldoc a2p

My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜