Why do I get extra line breaks in the web page I download with Perl?

2023-01-19 20:46 问答作者：

I'm writing a simple Perl script (on Windows) to download the response of a get request to a url to a file. Pretty straight-forward. Except when it writes to the output file, I get extra line b开发者_如何学JAVAreaks. So like instead of:

<head>
  <title>title</title>
  <link .../>
</head>

I get

<head>

  <title>title</title>

  <link .../>

</head>

Here's the Perl script:

use LWP::Simple;

my $url = $ARGV[0];
my $content = get($url);

open(outputFile, '+>', $ARGV[1]);

print outputFile $content;

close(outputFile);

I suppose I could just get wget for Windows, but now this is bothering me. How do I get rid of those extra line breaks?!

There's no sane reason for the >+ mode in your example code. Just saying.
LWP::Simple has a getstore method. If you're using LWP::Simple, why not use it?
By default, open is going to push the :crlf I/O layer when running on win32, which turns \n into \r\n. But the data you're writing already has \r\n, so you're ending up with too many newlines. If you want data to be written verbatim, you should use binmode, or open the handle with :raw to begin with. LWP already does this correctly.

I'm guessing that $content already includes CRLF newlines and Perl's IO layer is doing LF -> CRLF conversion. (Internally, "\n" is a single character in Perl, usually LF). I'd add

binmode(outputFile);

after the open to disable that conversion and write the results of $content directly.

chomp($content) would be my guess. as it looks like there is natively already set of \n's in it.

EDIT: Sorry I just realized that chomp won't work, unless you split the file up into lines, then chomp each line, as chomp will only chomp the end of the input string, my solution wouldn't help in this case, however, you could split it on \n\n and then join? I do like the solution to use a regex on the string returned in an answer below. however for me the minor modification of: including some additional changes, so it still separates lines but it will check for either 2+ \n's or 2+ \r's or any combination of the two. then returning a \n in it's place, that way it's only going to have one new line per line (hopefully)

$content =~ s/[\n\r]+/\n/g;

EDITED Above again, accidentally put a ! in there for some reason....not sure why

继续阅读：carriage-return line-breaks newline perl windows

Why do I get extra line breaks in the web page I download with Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？