How can I create all subsets of size smaller than n in Perl?
I have a set of sets. I want to create all sets that take at most one element from each original set.
For exampl开发者_开发技巧e, if my original set of sets is ((x,y),(A),(1,2))
then the solutions are:
(x)
(y)
(A)
(1)
(2)
(x,A)
(x,1)
(x,2)
(y,A)
(y,A)
(y,1)
(y,2)
(A,1)
(A,2)
(x,A,1)
(x,A,2)
(y,A,1)
(y,A,2)
I use the following code I have written to recursively calculate this:
# gets an array of arrays (aoa)
# returns an array of arrays with all subsets where zero or one element is
# taken from each array, e.g. in = [[a,b],[5],[X,Y,Z]], out =
# [[],[a],[b],[5],[X],[Y],[Z],[a,5],[b,5],[a,X],[a,Y],...,[b,5,Y],[b,5,Z]]
# note the order of elelemnts in each arry is immaterial (an array is
# considered an unordered set)
sub sets_aoa_to_subsets_aoa {
my $aoa = shift // confess;
if ( scalar( @{$aoa} ) == 0 ) {
return [ [] ];
}
my $a = shift @{$aoa};
my $subsets_aoa = sets_aoa_to_subsets_aoa($aoa);
my @new_subsets = ();
foreach my $subset_a ( @{$subsets_aoa} ) {
# leave subset as-is
push @new_subsets, $subset_a;
# add one element from $a
foreach my $e ( @{$a} ) {
push @new_subsets, [ $e, @{$subset_a} ];
}
}
return \@new_subsets;
}
however, I would like to add a limit on the size of the subset. For example, if I set max_size=2
then the last four solutions will be ignored. I can't simply generate all solutions then filter those who are too large, since sometimes I have more then 100 sets each with 2-3 elements, and 2^100 is not a nice number to handle, especially when I only want subsets of size 5 or less.
As I suspected, a regex works for this.
Specific Solution
Here’s the specific solution to the question precisely as posed. There are 80 answers.
my %seen;
"xy=a=12" =~ m{
[^=]* (x|y)* [^=]*
=
[^=]* (a)* [^=]*
=
[^=]* (1|2)* [^=]*
(?{
my $size = grep { length } $1, $2, $3;
print "<$1> <$2> <$3>\n"
if $size >= 1 &&
$size <= 2 &&
! $seen{$1,$2,$3}++;
})
(*FAIL)
}x;
Run that piped to cat -n
and you’ll see your 80 answers.
Of course, you’ll want something that’s generalized and extensible so that you can apply it to your situation of a hundred sets. It always takes longer to craft a general solution than a specific one, so I’ll work on that generalization and get back to you as soon as it looks pretty.
General Solution
Here’s the general solution; it’s hardly my prettiest piece of work, but it does work:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
our($MIN_PICK, $MAX_PICK) = (1, 2);
our @List_of_Sets = (
[ qw[ x y ] ],
[ qw[ a ] ],
[ qw[ 1 2 ] ],
);
sub dequeue($$) {
my($leader, $body) = @_;
$body =~ s/^\s*\Q$leader\E ?//gm;
return $body;
}
################################
my $gunk = " (?&gunk) ";
my $alter_rx = join("\n\t(?&post)\n" => map {
" $gunk ( "
. join(" | " => map { quotemeta } @$_)
. " ) * $gunk "
} @List_of_Sets);
##print "ALTER_RX <\n$alter_rx\n>\n";
my $string = join(" = ", map { join(" ", @$_) } @List_of_Sets);
##print "STRING: $string\n";
my $numbers_list = join(", " => map { '$' . $_ } 1 .. @List_of_Sets);
my $numbers_bracket = join(" " => map { '<$' . $_ . '>' } 1 .. @List_of_Sets);
my $print_statement = dequeue "|QQ|" => <<"PRINT_STATEMENT";
|QQ|
|QQ| (?{
|QQ| no warnings qw(uninitialized);
|QQ| my \$size = grep { length } $numbers_list;
|QQ| print "$numbers_bracket\\n"
|QQ| if \$size >= $MIN_PICK &&
|QQ| \$size <= $MAX_PICK &&
|QQ| ! \$seen{$numbers_list}++;
|QQ| })
|QQ|
PRINT_STATEMENT
## print "PRINT $print_statement\n";
my $search_rx = do {
use re "eval";
my %seen;
qr{
^
$alter_rx
$
$print_statement
(*FAIL)
(?(DEFINE)
(?<post> = )
(?<gunk> [^=] * )
)
}x;
};
## print qq(SEARCH:\n"$string" =~ $search_rx\n);
# run, run, run!!
$string =~ $search_rx;
I am somewhat concerned with the number of possibilities you expect to pull out of this. It may be that you should put this process I’ve outlined above on the other end of a pipe so that you can read from it however much you want and then hang up the phone, so to speak, when you’ve had your fill.
I realize this is a rather unusual solution; my code often is. :)
I just figure you might as well make the exhaustively permutational nature of regex backtracking do the work for you.
Perhaps others will pull out Some::Abstruse::Module
to do the job for you. You’ll just have to weigh which you prefer.
EDIT: Improved legibility, handled duplicates and extra min/max criteria.
Also a recursive solution, but passing the subset-built-sofar along, so you can stop as soon as you reach the maximum size.
#!/opt/perl/bin/perl
use strict;
use warnings;
use 5.010;
sub subsets
{
my ($sets, $maxSize, $subset) = @_;
$subset //= [ ];
# If we already have $maxSize elements, we're done
return ($subset) if @$subset == $maxSize;
# If we have no sets left to pick from, we're done
return ($subset) if !@$sets;
# Consider the next set
my @remainingSets = @$sets;
my $nextSet = shift(@remainingSets);
# We can choose either 0 or 1 element from this set, continue with the rest
return (subsets(\@remainingSets, $maxSize, $subset),
map { subsets(\@remainingSets, $maxSize, [@$subset, $_]) }
@$nextSet);
}
my $sets = [ [qw(x y)], [qw(A)], [qw(1 2)] ];
my @subsets = subsets($sets, 2);
foreach my $subset (@subsets) {
say '(', join(', ', @$subset), ')';
}
you could create a "state variable" which would track the number of calls to sets_aoa_to_subsets_aoa and then check for that in your treminal condition:
{
my $count=0;
sub sets_aoa_to_subsets_aoa {
$count++;
my ($aoa,$number_of_calls) = @_ // confess;
if ( (scalar( @{$aoa} ) == 0) or ($count == $number_or_calls)) {
return [ [] ];
}
......
}
}
foreach my $e ( @{$a} ) {
push @new_subsets, [ $e, @{$subset_a} ];
}
simply pass down a $items_wanted
paramter and skip the highlighted bit of code if @{$subset_a} > $items_wanted
. Since the lines above already generate all of the combinations that don't add additional items, this will work without any further changes.
精彩评论