Help converting to subroutine
I have tried to convert my code into a series of subroutines to make it more modular. The conditional statements in the code below is what I can't incorporate into the subroutine.
next unless ( $sentblock =~ /\[sent. \d+ len. \d+\]: \[.+\]/ ); #1#
( $sentence, $sentencenumber ) = &sentence_sentnum_chptnum($sentblock); #SUBROUTINE
if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) { #2#
$chapternumber = $1;
$chapternumber =~ tr/./_/;
}
next
unless ( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i );
next
if ( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i );
next
if ( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i );
The subroutine so far:
sub sentence_sentnum_chptnum {
my $subsentblock = shift;
my ( $subsentence, $subsentencenumber );
return unless ( $subsentblock =~ /\[sent. (\d+) len. \d+\]: \[(.+)\]/ ); #DIDN'T replace the need to put one in the main script
$subsentencenumber = $1;
$subsentence = $2;
$subsentence =~ s/, / /g;
return ( $subsentence, $subsentencenumber );
}
It works as is, but if I try putting the other conditional statements in: I get errors saying $sentence is uninitialized
later in the code. Example: If I try to include the check of $addkey
using the same condition, but just swapping next
for return
I get an error that $sentence 开发者_JS百科is uninitialized
in the line: if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) {
And likewise if I put any of those conditions into the subroutine.
Main Question: How can I:
(1) get rid of next unless ( $sentblock =~ /\[sent. \d+ len. \d+\]: \[.+\]/ );
(it's in the subroutine too)
(2) Include: if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ )
& all 3 next
statements
(3) Since it's included, also return $chapternumber
Without affecting my code?
General Best Practice Question: If I have variables defined at the top of my code (from an HTML form) is it better practice to localize them each time in every subroutine, or just not pass anything into the subroutine, and use the value assigned at the beginning of the code? (Ex. $search_key
, $addkey
and $exc
)?
Test Case I have made a test case, however it is pretty long, so I didn't include it. If you need one, it is very similar to: http://perlmonks.org/?node_id=912276 just find where the subroutine takes over and delete that part... It's right after foreach my $sentblock (@parsed)
.
Note: The test case does not include addkey
or exc
, and nothing will match the chapternumber (put '~ 5.5' in front of one sentence to include it)
I've tried checking the returned $sentence
in the main program. This eliminates the error, but there are no matches for the rest of the program (ie. The end result of the search engine is 0 results).
Thanks, let me know if anything is unclear.
How much do you want to break things down? It's hard to see what the "best" or "right" way to split things up is without more code.
In general, if you go through your code and add comments describing what each block of code does, you could just as readily replace each commented block with a sub that has a name that recaps the sentence:
# Is this a sentence block?
next unless ( $sent_block =~ /\[sent. \d+ len. \d+\]: \[.+\]/ );
#1#
my ( $sentence, $sentence_number ) = parse_sentence_block($sent_block);
# Get chapter info if present
if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) { #2#
$chapter_number = $1;
$chapter_number =~ tr/./_/;
}
# Skip if key found
next
unless ( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i );
# skip if excrescence 0 (or whatever exc is short for)
next
if ( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i );
# skip if excrescence 1.
next
if ( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i );
Now take these comments and make them into subs:
next unless is_sentence_block( $sent_block );
my( $sentence, $sentence_number ) = parse_sentence_block($sent_block);
# Maybe update the chapter number
my $new_chapter_number = get_chapter_number( $sentence );
$chapter_number = $new_chapter_number if defined $new_chapter_number;
next unless have_all_keys( $sentence => $search_key, $add_key0, $add_key1 );
next if have_excrescence( $exc0 );
next if have_excrescence( $exc1 );
sub is_sentence_block {
my $block = shift;
return $sent_block =~ /\[sent. \d+ len. \d+\]: \[.+\]/ );
}
sub get_chapter_number {
my $sentence = shift;
return unless $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/;
return $1;
}
sub have_all_keys {
my $sentence = shift;
my @keys = @_;
for my $key ( @keys ) {
return unless $sentence =~ /\b\Q$key1\E/i;
}
return 1
}
sub have_excrescence {
my $sentence = shift;
my $exc = shift;
return 0 unless defined($exc);
return 0 unless length($exc)
return 0 unless $sentence =~ /\b\Q$exc\E\b/i );
return 1;
}
Try this approach (some of this code may look familiar to you ;-) ):
sub extractSentenceAndPositions {
my $sentenceBlock = shift;
my ($sentence, $sentenceNumber, $chapterNumber) = ("", "", "");
if ($sentenceBlock =~ /\[sent. (\d+) len. \d+\]: \[(.+)\]/) {
$sentenceNumber = $1;
$sentence = $2;
$sentence =~ s/, / /g;
if ($sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/) { #2#
$chapterNumber = $1;
$chapterNumber =~ tr/./_/;
}
# Turning the original 'next-unless' chain into a conditional
# which zeroes out the return values instead
if ( !( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i )
||
!( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i )
||
!( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i )
) {
($sentence, $sentenceNumber, $chapterNumber) = ("", "", "");
}
}
return ($sentence, $sentenceNumber, $chapterNumber);
}
Then, replace your first listing with...
($sentence, $sentenceNumber, $chapterNumber) = extractSentenceAndPositions($sentblock);
next if (!$sentence || !$sentenceNumber || !$chapterNumber);
Regarding your best practices question, I would say for this use case (cgi vars and the like), where those values are almost certainly not going to change, I'd refer to them directly. The basic concept I generally follow is to scrub them once at the beginning of the run (by which I mean sanitize away any SQL injections, XSS, XSRF, shell injections, or other such nastiness in the values) and from then on treat them as read-only globals. I've heard other opinions on the subject, but that's what I usually do.
As far as checking the returned $sentence
in the main program somehow destroying all the other matches, I'm not sure how that would happen unless there's something else going on. I've used this approach (next
or last
based on returned values) in numerous scripts, and there's nothing inherently destructive about it.
精彩评论