Trying to understand Perl split() output

2023-03-07 22:43 问答作者：

I have a few lines of text that I'm trying to use Perl's split function to convert into an array. The problem is that I'm getting some unusual extra characters in the output, specifically the following string "\cM" (without the quotes). This string appears where there were line breaks in the original text; however, (I believe) those line breaks were removed in the text that I'm trying to split. Does anybody know what's going on with this phenomenon? I posted an example below. Thanks.

Here's the original plain text that I'm trying to split. I'm loading it from a file, in case that matters:

10b2obo12b2o2b$6b3obob3o8bob3o2b$2bobo10bo3b2obo4bo2b$2o4b2o5bo3b4obo
3b2o2b$2bob2o2bo4b3obo5b4obob$8bo4bo13b3o$2bob2o2bo4b3obo5b4obob$2o4b
2o5bo3b4obo3b2o2b$2bo开发者_StackOverflow中文版bo10bo3b2obo4bo2b$6b3obob3o8bob3o2b$10b2obo12b2o!

Here is my Perl code that is supposed to do the splitting:

while(<$FH>) {
    chomp;
    $string .= $_;
    last if m/!$/;
}

@rows = split(qr/\$/, $string);
print;          # a dummy line to provide a breakpoint for the debugger

This what the debugger outputs when it gets to the "print" line. The issue I'm trying to deal with appears in lines 3, 7, and 10:

DB<10> p $string
2o5bo3b4obo3b2o2b$2bobo10bo3b2obo4bo2b$6b3obob3o8bob3o2b$10b2obo12b2o!
DB<11> x @rows
0  '10b2obo12b2o2b'
1  '6b3obob3o8bob3o2b'
2  '2bobo10bo3b2obo4bo2b'
3  "2o4b2o5bo3b4obo\cM3b2o2b"
4  '2bob2o2bo4b3obo5b4obob'
5  '8bo4bo13b3o'
6  '2bob2o2bo4b3obo5b4obob'
7  "2o4b\cM2o5bo3b4obo3b2o2b"
8  '2bobo10bo3b2obo4bo2b'
9  '6b3obob3o8bob3o2b'
10  "10b2obo12b2o!\cM"

You know, changing the file input separator would make this code a lot simpler.

$/ = '$';

my @rows = <$FH>;
chomp @rows;

print "@rows";

The debugger is probably using \cM to represent Ctrl-M which is also known as a carriage return (and sometimes \r or ^M). Text files from Windows use a CR-LF (carriage return, line feed) pair to represent the end of a line. If you read such a file on a Unix system, your chomp will strip off the Unix EOL (a single line feed) but leave the CR as is and you end up with stray CRs in your file.

For a file like you have you can just strip out all the trailing whitespace instead of using chomp:

while(defined(my $line = <$FH>)) {
    $line    =~ s/\s+$//;
    $string .= $line;
    last if($line =~ /!$/);
}

You don't say which OS you're on. Check out binmode and what it has to say about \cM, and that their position coincides with the line endings of your input file:

http://perldoc.perl.org/functions/binmode.html

继续阅读：perl

Trying to understand Perl split() output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？