Problem with perl multiline matching
I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm see开发者_运维百科ing:
ABCD START
STOP EFGH
I expected the following to work but it doesn't end up replacing anything:
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
After doing some experimenting I found that the \s+
in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+
doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:
perl -pi -e 's/START\s+/START/s' input.txt
This creates the following intermediate file:
ABCD START STOP EFGH
Then I can run the original regex (although the /s
is no longer needed):
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
This creates the final, desired file:
ABCD REPLACE EFGH
It seems like the intermediate step should not be necessary. Am I missing something?
You were close. You need either -00
or -0777
:
perl -0777 -pi -e 's/START\s+/START/' input.txt
perl -p
processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.
A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p
):
$/ = undef;
$file = <>;
$file =~ s/START\s+STOP/REPLACE/sg;
print $file;
Note, I have added the /g
modifier to specify global replacement.
As a shortcut for all that extra boilerplate, you can use your existing script with the -0777
option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'
. Adding /g
is still needed if you may need to make multiple replacements within the file.
A hiccup that you might run into, although not with this regex: if the regex were START.+STOP
, and a file contains multiple START/STOP pairs, greedy matching of .+
will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?
.
If you want to use the ^
and $
anchors for line boundaries anywhere in the string, then you also need the /m
regex modifier.
A relatively simple one-liner (reading the file in memory):
perl -pi -e 'BEGIN{undef $/;} s/START\s+STOP/REPLACE/sg;' input.txt
Another alternative (not so simple), not reading the file in memory:
perl -ni -e '$a.=$_; \
if ( $a =~ s/START\s+STOP/REPLACE/s ) { print $a; $a=""; } \
END{$a && print $a}' input.txt
perl -MFile::Slurp -e '$content = read_file(shift); $content =~ s/START\s+STOP/REPLACE/s; print $content' input.txt
Here's a one-liner that doesn't read the entire file into memory at once:
perl -i -ne 'if (($x = $last . $_) =~ s/START\n\s*STOP/REPLACE/) \
{ print $x; $last = ""; } else { print $last; $last = $_; } \
print $last if eof ARGV' input.txt
精彩评论