开发者

Perl Regex To Condense Multiple Line Breaks

I can't seem to figure out the right syntax but I want a Perl regular expression to find where there are two or more line breaks in a row and condense them into just 2 line breaks.

Here is what I'm using today which doesn't seem to work:

$string =~ s/\n\n+/\n\n/g;

Please let me know what I'm doing wrong and the correct Perl regex I should be using.

Thanks in a开发者_开发技巧dvance for your help!


If you're using Perl 5.10 or later, try this:

$string =~ s/(\R)(?:\h*\R)+/$1$1/g;

\R is the generic line-separator escape sequence (ref), and \h matches any horizontal whitespace character (e.g. space and TAB) (ref). So this will convert any sequence of one or more blank lines to one empty line.

Most applications these days are liberal in what they'll recognize as a line separator; they'll even accept a mix of two or more styles of separator in the same document. On the other hand, some apps actively convert all line separators to one preferred style. But sometimes you do have to stick to one particular style; that's why I captured the first \R match and used it as the replacement, instead of arbitrarily using \n.

Be aware that these special escape sequences aren't widely supported in other regex flavors. They work in recent versions of PHP, and \R seems to work in Ruby 2.0, though I can't find any doc that mentions it. Ruby 1.9.2 and 2.0 support a \h escape sequence, but it matches a hexadecimal digit ([0-9a-fA-F]), not horizontal whitespace. In most other flavors, \R and \h will either throw an exception or match a literal R and h respectively.


This does it:

#!/usr/bin/env perl
use strict;
use warnings;
my $string;
{
   local $/=undef;
   $string =<DATA>;
} 
print "Before:\n$string\n============";

$string=~s/\n{2,}/\n\n/g;
print "After:\n$string\n\nBye Bye!";

__DATA__
Line 1
Line 2






Line 9
Line 10

Line 12



Line 16


Line 19

Output:

Before:
Line 1
Line 2






Line 9
Line 10

Line 12



Line 16


Line 19
============After:
Line 1
Line 2

Line 9
Line 10

Line 12

Line 16

Line 19

Perl also supports the \R character class for platform independence. See this SO link. Your regex would then be s/\R{2,}/\n\n/g;


Show a full example. What is $string?

$ perl -E'my $s = qq{a\n\n\nb}; say "[$s]"; $s =~ s/\n\n+/\n\n/g; say "[$s]"'
[a


b]
[a

b]


@btilly hit the nail on the head. I did a quick test case:

in:

a

b




c

with this code:

my $line = join '', <>;
$line =~ s{\n\n+}{\n\n}g;
print $line;

and it returned the expected result:

a

b

c

You can get the same result by changing the record separator (and avoiding the regex):

{
    # change the Record Separator from "\n" to ""
    # treats multiple newlines as just one (perldoc perlvar)
    # local limits the change to the global $/ to this block
    local $/ = "";
    print <>;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜