Perl -- do {read file} until regex found. Finds match but does not break out of the loop
I wanna read a file to extract a few lines of information. I have created a do .. until to ignore the file lines until I reach the part I'm actually interested in, which contains the word V2000. i prefer to use a general regex rather than look for V2000.
The match is found but it doesn't break out the do .. until loop and therefore I'm unable to extract the info that comes right after that
Does anyone know why?
do {$line = <IN_SDF>;} until ($line =~ m/V\d+/);
and the rest of the code is:
my @aline = split ('', $line);
my $natoms = $aline[0];
my $out= shift;
do{开发者_如何学JAVA
<IN_SDF>;
@aline = split ('', $_);
print OUT_3D $aline[3]."\t".$aline[0]."\t".$aline[1]."\t".$aline[2]."\n";
} until --$natoms == 0;
Are you assuming that a bare
<IN_SDF>
will load the next line from that filehandle into $_
? That is incorrect. You only get that behavior with a while
expression:
while (<IN_SDF>) is equivalent to while (defined($_=<IN_SDF>))
If you mean
$_ = <IN_SDF>
then say so.
For the first part of your question, this idiom:
while ($line = <IN_SDF>) {
last if $line =~ m/V\d+/;
}
is preferable to
do {
$line = <IN_SDF>
} until $line =~ m/V\d+/;
because the latter expression will go into an infinite loop when you run out of input (and $line
becomes undefined).
Let me get this straight.
- You want to scan input until you see a line with a
'V'
followed by any number anywhere in the line. - Then you want to break up the line by characters
- And assign the first character in the line to
$natoms
, which is a single digit telling you how many lines to scan. - And then you want to scan each of those lines and display the first 4 characters.
Is that correct?
As for your breaking out of the loop problem, when I ran a version of that code, it worked fine for me. With strict or without.
I ran across this while trying to parse a broken, single line 50MB XML file. I wrote my own sub to do this although I don't know if it works for the original poster:
sub ReadNext($$) {
my ($hh, $pattern) = @_;
my ($buffer, $chunk, $chunkSize) = ('', '', 512);
while(my $bytesRead = read($hh, $chunk, $chunkSize) > 0) {
$buffer .= $chunk;
if ($buffer =~ $pattern) {
my ($matchStart, $matchEnd) = (@-, @+);
my $result = substr($buffer, $matchStart, $matchEnd - $matchStart);
my $pos = tell($hh);
# Rewind the stream to where this match left off
seek($hh, ($pos -= length($buffer)-$matchEnd), 0);
return $result;
}
}
undef;
}
open(my $fh, $ARGV[0]) or die("Could not open file: $!");
while(my $chunk = ReadNext($fh, qr/<RECORD>.+?<\/RECORD>/)) {
print $chunk, "\n";
}
close($fh);
Which for me prints out every RECORD element from the XML with a newline.
精彩评论