开发者

Can a Perl subroutine return data but keep processing?

Is there any way to have a subroutine send data back while still processing? For instance (this example used simply to illustrate) - a subroutine r开发者_运维技巧eads a file. While it is reading through the file, if some condition is met, then "return" that line and keep processing. I know there are those that will answer - why would you want to do that? and why don't you just ...?, but I really would like to know if this is possible.


A common way to implement this type of functionality is with a callback function:

{
    open my $log, '>', 'logfile' or die $!;
    sub log_line {print $log @_}
}

sub process_file {
    my ($filename, $callback) = @_;
    open my $file, '<', $filename or die $!;
    local $_;
    while (<$file>) {
        if (/some condition/) {
             $callback->($_)
        }
        # whatever other processing you need ....
    }
}

process_file 'myfile.txt', \&log_line;

or without even naming the callback:

process_file 'myfile.txt', sub {print STDERR @_};


Some languages offer this sort of feature using "generators" or "coroutines", but Perl does not. The generator page linked above has examples in Python, C#, and Ruby (among others).


The Coro module looks like it would be useful for this problem, though I have no idea how it works and no idea whether it does what it advertises.


The easiest way to do this in Perl is probably with an iterator-type solution. For example, here we have a subroutine which forms a closure over a filehandle:

open my $fh, '<', 'some_file.txt' or die $!;
my $iter = sub { 
    while( my $line = <$fh> ) { 
        return $line if $line =~ /foo/;
    }

    return;
}

The sub iterates over the lines until it finds one matching the pattern /foo/ and then returns it, or else returns nothing. (undef in scalar context.) Because the filehandle $fh is defined outsite the scope of the sub, it remains resident in memory between calls. Most importantly, its state, including the current seek position in the file, is retained. So each call to the subroutine resumes reading the file where it last left off.

To use the iterator:

while( defined( my $next_line = $iter->() ) ) { 
    # do something with each line here
}


If you really want do this you can by using threading. One option would be to fork a separate thread that reads the file and when it finds a certain line, place it in an array that is shared between threads. Then the other thread could take the lines, as they are found, and process them. Here is an example that reads a file, looks for an 'X' in a file's line, and does an action when it is found.

use strict;
use threads;
use threads::shared;

my @ary : shared;

my $thr = threads->create('file_reader');

while(1){
    my ($value);
    {
        lock(@ary);
        if ($#ary > -1){
            $value = shift(@ary);
            print "Found a line to process:  $value\n";
        }
        else{
            print "no more lines to process...\n";
        }            
    }

    sleep(1);
    #process $value
}


sub file_reader{

            #File input
    open(INPUT, "<test.txt");
    while(<INPUT>){
        my($line) = $_;
        chomp($line);

        print "reading $line\n";

        if ($line =~ /X/){
            print "pushing $line\n";
            lock(@ary);
            push @ary, $line;
        }
        sleep(4)
    }
    close(INPUT);
}

Try this code as the test.txt file:

line 1
line 2X
line 3
line 4X
line 5
line 6
line 7X
line 8
line 9
line 10
line 11
line 12X


If your language supports closures, you may be able to do something like this:

By the way, the function would not keep processing the file, it would run just when you call it, so it may be not what you need.

(This is a javascript like pseudo-code)

function fileReader (filename) {
    var  file = open(filename);

    return function () {
        while (s = file.read()) {
            if (condition) {
                return line;
            }
        }
        return null;
   }     
}

a = fileReader("myfile");
line1 = a();
line2 = a();
line3 = a();


What about a recursive sub? Re-opening existing filehandles do not reset the input line number, so it carries on from where it's left off.

Here is an example where the process_file subroutine prints out blank-line-separated "\n\n" paragraphs that contain foo.

sub process_file {

    my ($fileHandle) = @_;
    my $paragraph;

    while ( defined(my $line = <$fileHandle>) and not eof(<$fileHandle>) ) {

        $paragraph .= $line;
        last unless length($line);
    }

    print $paragraph if $paragraph =~ /foo/;
    goto &process_file unless eof($fileHandle);  
       # goto optimizes the tail recursion and prevents a stack overflow
       # redo unless eof($fileHandle); would also work
}

open my $fileHandle, '<', 'file.txt';
process_file($fileHandle);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜