Why are references compacted inside Perl lists?
Putting a precompiled regex inside two different hashes referenced in a list:
my @list = ();
my $regex = qr/ABC/;
push @list, { 'one' => $regex };
push @list, { 'two' => $regex };
use Data::Dumper;
print Dumper(\@list);
I'd expect:
$VAR1 = [
{
'one' => qr/(?-xism:ABC)/
},
{
'two' => qr/(?-xism:ABC)/
}
];
But instead we get a circular reference:
$VAR1 = [
{
'one' => qr/(?-xism:ABC)/
},
{
'two' => $VAR1->[0]{'one'}
}
];
This will happen with indefinitely nested hash references and shallowly copied $regex
.
I'm ass开发者_如何学JAVAuming the basic reason is that precompiled regexes are actually references, and references inside the same list structure are compacted as an optimization (\$scalar behaves the same way). I don't entirely see the utility of doing this (presumably a reference to a reference has the same memory footprint), but maybe there's a reason based on the internal representation
Is this the correct behavior? Can I stop it from happening? Aside from probably making GC more difficult, these circular structures create pretty serious headaches. For example, iterating over a list of queries that may sometimes contain the same regular expression will crash the MongoDB driver with a nasty segfault (see https://rt.cpan.org/Public/Bug/Display.html?id=58500)
This is the expected behavior.
Your reference isn't really circular; you have two separate items that point to the same thing. Data::Dumper is printing a human-readable, Perl-parsable representation of your data structures in memory, and what it really means is that both $list[0]->{one}
and $list[1]->{two}
point to the same thing.
Perl uses reference-counting garbage collection, and while it can get into trouble with circular data structures, this data structure presents no particular problem.
Nothing funny is happening here.
- You stored the same reference twice in the same data structure.
- Then you asked Data::Dumper to print a representation of that structure.
- Data::Dumper wants to roundtrip the data you give it as faithfully as possible, which means that it needs to output Perl code that will generate a data structure that contains the same reference at
$list[0]{one}
as it does at$list[0]{two}
. - It does this by outputting a data structure where one member contains a reference to another member of the same structure.
- But it's not actually a circular reference.
I'm assuming the basic reason is that precompiled regexes are actually references, and references inside the same list structure are compacted as an optimization (\$scalar behaves the same way). I don't entirely see the utility of doing this (presumably a reference to a reference has the same memory footprint), but maybe there's a reason based on the internal representation
The reason is that it could be a reference to a data structure that nested somewhere contains a reference back to the top level (loop). If it continued into such a structure then it would create an infinite loop. The way it avoids this is to never recurse into a reference it has already seen, so instead it prints that it has already seen it and refers you to the previous printed location.
In this case there's no loop, but Data::Dumper has no way of knowing that before it recurses into the structure at which point it's too late.
For a scalar like this it's probably not essential to do so, but probably happens because Data::Dumper checks whether it's already seen the reference before checking the type. It also gives the benefit that it shows it is a reference to the same data, not a copy of it, which is perhaps useful information that would be lost if it just printed the value.
精彩评论