In Perl, can I limit the length of a line as I read it in from a file (like fgets)?

2023-01-01 04:35 问答作者：

I'm trying to write a piece of code that reads a file line by line and stores each line, up to a certain amount of input data. I want to guard against the end-user being evil and putting something like a gig of data on one line in addition to guarding against sucking in an abnormally large file. Doing $str = <FILE> will still read in a whole line, and that could be very long and blow up my memory.

fgets lets me do this by letting me specify a number of bytes to read during each call and essentially letting me split one long line into my max length. Is there a similar way to do this in perl? I saw something about sv_gets but am not sure how to use it (though I only did a cursory Google search).

The goal of this exercise is to avoid having to do additional parsing / buffering after reading data. fgets stops after N bytes or when a newline is reached.

EDIT I think I confused some. I want to read X lines, each with max length Y. I don't want to read more than Z bytes total, and I would prefer not to read all Z bytes at once. I guess I could just do that and split the line开发者_高级运维s, but wondering if there's some other way. If that's the best way, then using the read function and doing manual parse is my easiest bet.

Thanks.

Perl has no built-in fgets, but File::GetLineMaxLength implements it.

If you want to do it yourself, its pretty straightforward with getc.

sub fgets {
    my($fh, $limit) = @_;

    my($char, $str);
    for(1..$limit) {
        my $char = getc $fh;
        last unless defined $char;
        $str .= $char;
        last if $char eq "\n";
    }

    return $str;
}

Concatenating each character to $str is efficient as Perl will realloc opportunistically. If a Perl string has 16 bytes and you concatenate another character, Perl will reallocate it to 32 bytes (32 goes to 64, 64 to 128...) and remember the length. The next 15 concatenations require no memory reallocations or calls to strlen.

sub heres_what_id_do($$) {
    my ($fh, $len) = @_;
    my $buf = '';

    for (my $i = 0; $i < $len; ++$i) {
        my $ch = getc $fh;
        last if !defined $ch || $ch eq "\n";
        $buf .= $ch;
    }

    return $buf;
}

Not very "Perlish" but who cares? :) The OS (and possibly Perl itself) will do all the necessary buffering underneath.

As an exercise, I've implemented a wrapper around C's fgets() function. It falls back to a Perl implementation for complicated filehandles defined as "anything without a fileno" to cover tied handles and whatnot. File::fgets is on its way to CPAN now, you can pull a copy from the repository.

Some basic benchmarking shows its over 10x faster than any of the implementations here. However, I cannot say its bug free or doesn't leak memory, my XS skills are not that great, but its better tested than anything here.

Use the read function (perlfunc read)

You can implement fgets() yourself trivially. Here's one that works like C:

sub fgets{my($n,$c)=($_[1],''); ($_[0])=('');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$_[0].=($c=getc($_[2]));}
  defined($c)&&$_[0]; }

Here's one with PHP's semantics:

sub fgets{my($n,$c,$x)=($_[1],'','');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$x.=($c=getc($_[0]));}
  ($x ne '')&&$x; }

If you're trying to implement resource limits (i.e. trying to prevent an untrusted client from eating up all your memory) you really should not be doing it this way. Use ulimit to set up those resource limits before calling your script. A good sysadmin will set up resource limits anyway, but they like it when programmers make startup scripts that set reasonable limits.

If you're trying to limit input before you proxy this data to another site (say, limiting SMTP input lines because you know remote sites might not support more than 511 characters), then just check the length of the line after <INPUT> with length().

继续阅读：fgets perl

In Perl, can I limit the length of a line as I read it in from a file (like fgets)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？