开发者

How do I convert various user-inputted line break characters to <br> using Perl?

I have a <textarea> for user input, and, as they are invited to do, users liberally add line brea开发者_Python百科ks in the browser and I save this data directly to the database.

Upon displaying this data back on a webpage, I need to convert the line breaks to <br> tags in a reliable way that takes into consideration to \n's the \r\n's and any other common line break sequences employed by client systems.

What is the best way to do this in Perl without doing regex substitutions every time? I am hoping, naturally, for yet another awesome CPAN module recommendation... :)


There's nothing wrong with using regexes here:

s/\r?\n/<br>/g;


Actually, if you're having to deal with Mac users, or if there still happens to be some weird computer that uses form-feeds, you would probably have to use something like this:

$input =~ s/(\r\n|\n|\r|\f)/<br>/g;


#!/usr/bin/perl

use strict; use warnings;

use Socket qw( :crlf );

my $text = "a${CR}b${CRLF}c${LF}";

$text =~ s/$LF|$CR$LF?/<br>/g;

print $text;

Following up on @daxim's comment, here is the modified version:

#!/usr/bin/perl

use strict; use warnings;
use charnames ':full';

my $text = "a\N{CR}b\N{CR}\N{LF}c\N{LF}";

$text =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

print $text;

Following up on @Marcus's comment here is a contrived example:

#!/usr/bin/perl

use strict; use warnings;
use charnames ':full';

my $t = (my $s = "a\012\015\012b\012\012\015\015c");
$s =~ s/\r?\n/<br>/g;

$t =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

print "This is \$s: $s\nThis is \$t:$t\n";

This is a mismash of carriage returns and line feeds (which, at some point in the past, I did encounter).

Here is the output of the script on Windows using ActiveState Perl:

C:\Temp> t | xxd
0000000: 5468 6973 2069 7320 2473 3a20 613c 6272  This is $s: a<br
0000010: 3e3c 6272 3e62 3c62 723e 3c62 723e 0d0d  ><br>b<br><br>..
0000020: 630d 0a54 6869 7320 6973 2024 743a 613c  c..This is $t:a<
0000030: 6272 3e3c 6272 3e62 3c62 723e 3c62 723e  br><br>b<br><br>
0000040: 3c62 723e 3c62 723e 630d 0a              <br><br>c..

or, as text:

chis is $s: a<br><br>b<br><br>
This is $t:a<br><br>b<br><br><br><br>c

Admittedly, you are not likely to end up with this input. However, if you want to cater for any unexpected oddities that might indicate a line ending, you might want to use

$s =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

Also, for reference, CGI.pm canonicalizes line-endings this way:

# Define the CRLF sequence.  I can't use a simple "\r\n" because the meaning
# of "\n" is different on different OS's (sometimes it generates CRLF, sometimes LF
# and sometimes CR).  The most popular VMS web server
# doesn't accept CRLF -- instead it wants a LR.  EBCDIC machines don't
# use ASCII, so \015\012 means something different.  I find this all 
# really annoying.
$EBCDIC = "\t" ne "\011";
if ($OS eq 'VMS') {
  $CRLF = "\n";
} elsif ($EBCDIC) {
  $CRLF= "\r\n";
} else {
  $CRLF = "\015\012";
}


As a matter of general principle, storing the data as entered by the user and doing the EOL-to-<br> conversion each time it's displayed is the better (even Right™) way to do it, both for the sake of having access to the original version of the data and because you may decide at some point that you want to change your filtering algorithm.

But, no, I personally would not use a regex in this case. I would use Parse::BBCode, which provides a whole lot of additional functionality (i.e., full BBCode support, or at least as much as you choose not to disable) in addition to providing line breaks without requiring users to explicitly enter markup for them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜