Can I use Perl's unpack to break up a string into vars?
I have an image file name that consists of four parts:
$Directory
(the directory where the image exists)$Name
(for a art site, this is the paintings name reference #)$File
(the images file name minus extension)$Extension
(the images extension)
$example 100020003000.png
Which I desire to be broken down accordingly:
$dir=1000 $name=2000 $file=3000 $ext=.png
I was wondering if substr was the best option in breaking up the incoming $example
so I can do stuff with the 4 variables like validation/error checking, grabbing the verbose name from its $Name
assignment or whatever. I found this post:
is unpack faster than substr? So, in my beginners "stone tool" approach:
my $example = "100020003000.png";
my $dir = substr($example, 0,4);
my $name = substr($example, 5,4);
my $file = substr($example, 9,4);
my $ext = substr($example, 14,3); # will add the the "." later #
So, can I use unpack, or maybe even another approach that would be more efficient?
I would also like to avoid loading any modules unless doing so would use less resources for some reason. Mods are great tools I luv开发者_JAVA百科'em but, I think not necessary here.
I realize I should probably push the vars into an array/hash but, I am really a beginner here and I would need further instruction on how to do that and how to pull them back out.
Thanks to everyone at stackoverflow.com!
Absolutely:
my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = unpack 'A4' x 4, $example;
print "$dir\t$name\t$file\t$ext\n";
Output:
1000 2000 3000 .png
I'd just use a regex for that:
my ($dir, $name, $file, $ext) = $path =~ m:(.*)/(.*)/(.*)\.(.*):;
Or, to match your specific example:
my ($dir, $name, $file, $ext) = $example =~ m:^(\d{4})(\d{4})(\d{4})\.(.{3})$:;
Using unpack
is good, but since the elements are all the same width, the regex is very simple as well:
my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = $example =~ /(.{4})/g;
It isn't unpack, but since you have groups of 4 characters, you could use a limited split, with a capture:
my ($dir, $name, file, $ext) = grep length, split /(....)/, $filename, 4;
This is pretty obfuscated, so I probably wouldn't use it, but the capture in a split is an ofter overlooked ability.
So, here's an explanation of what this code does:
Step 1. split
with capturing parentheses adds the values captured by the pattern to its output stream. The stream contains a mix of fields and delimiters.
qw( a 1 b 2 c 3 ) == split /(\d)/, 'a1b2c3';
Step 2. split
with 3 args limits how many times the string is split.
qw( a b2c3 ) == split /\d/, 'a1b2c3', 2;
Step 3. Now, when we use a delimiter pattern that matches pretty much anything /(....)/
, we get a bunch of empty (0 length) strings. I've marked delimiters with D
characters, and fields with F
:
( '', 'a', '', '1', '', 'b', '', '2' ) == split /(.)/, 'a1b2';
F D F D F D F D
Step 4. So if we limit the number of fields to 3 we get:
( '', 'a', '', '1', 'b2' ) == split /(.)/, 'a1b2', 3;
F D F D F
Step 5. Putting it all together we can do this (I used a .jpeg
extension so that the extension would be longer than 4 characters):
( '', 1000, '', 2000, '', 3000, '.jpeg' ) = split /(....)/, '100020003000.jpeg',4;
F D F D F D F
Step 6. Step 5 is almost perfect, all we need to do is strip out the null strings and we're good:
( 1000, 2000, 3000, '.jpeg' ) = grep length, split /(....)/, '100020003000.jpeg',4;
This code works, and it is interesting. But it's not any more compact that any of the other solutions. I haven't bench-marked, but I'd be very surprised if it wins any speed or memory efficiency prizes.
But the real issue is that it is too tricky to be good for real code. Using split
to capture delimiters (and maybe one final field), while throwing out the field data is just too weird. It's also fragile: if one field changes length the code is broken and has to be rewritten.
So, don't actually do this.
At least it provided an opportunity to explore some lesser known features of split
.
Both substr
and unpack
bias your thinking toward fixed-layout, while regex solutions are more oriented toward flexible layouts with delimiters.
The example you gave appeared to be fixed layout, but directories are usually separated from file names by a delimiter (e.g. slash for POSIX-style file systems, backwardslash for MS-DOS, etc.) So you might actually have a case for both; a regex solution to split directory and file name apart (or even directory/name/extension) and then a fixed-length approach for the name part by itself.
精彩评论