Perl RegEx and sale prices
I borrowed the script used by SteamCalculator.com and wanted to modify it slightly to grab not just the price of the games on Steam, but the sale prices as well (if they exist)
The code was extremely straight-forward and easy enough to read. To retrieve the price he looks at the HTML from the steampowered.com search feature, pulls out everything between <div class=\"col search_price\">
and </div>
, then runs the following sub-routine:
sub formPrice($)
{
my $price = shift;
if($price =~ m/(\d+)(?:\.|,)(\d{2})/)
{
return $1.$2;
}
else
{
return 0;
}
}
The price is able to take one of 4 forms, depending on the country code you are looking for prices in and whether or not the game is on sale. These four forms are:
$9.99
<span><strike>$9.99</strike></span><br>$8.99
9,99£
<span><strike>9.99£</strike></span><br>8,99£
As you can开发者_开发知识库 see, regardless of which form the price takes his script will grab the very first instance of (\d+)
(first group of digits, returning 9
in every case) as well as the (\d{2})
(group of 2 digits) following \.|,
(dot or comma). When these are combined the sub-routine always returns 999
, regardless of which of the four formats the price has.
I've been trying to find a way to modify this sub-routine to return 999
in cases 1 and 3, but return 899
in cases 2 and 4. So far I have tried:
1:
if((reverse $price) =~ m/(\d+)(?:\.|,)(\d{2})/g){
return $2.$1;
}
2:
if($price =~ m/.*?(\d+)(?:\.|,)(\d{2})/g){
return $1.$2;
}
3:
if($price =~ m/.*?(\d+)(?:\.|,)(\d{2})$/){
return $1.$2;
}
The first returned prices such as 9199
for $19.99
. The second the .*?
was still being greedy and it was returning 999
for $19.99
. The third returned 0
in cases 3 and 4 (dealing with euros)
Anchoring the end as Flimzy suggests is the easiest solution.
I'm curious what you were trying to accomplish with your second attempt:
if($price =~ m/.*?(\d+)(?:\.|,)(\d{2})/g){
return $1.$2;
}
Adding the g doesn't do anything particularly useful in this case. Adding .*
(not .*?
) to the beginning does get you the last match instead of the first, but you do need to guard against the match starting later than you want, e.g.:
if ( $price =~ m/.*\b(\d+)(?:\.|,)(\d{2})/ ) {
return $1.$2;
}
This seems to work for me:
m/(\d+)(?:\.|,)(\d{2})£?\s*$/
Here's a way to do it with the global option:
sub price {
my $str = shift;
my @nums = $str =~ /(\d+)[.,]*(\d{2})/g;
return 0 unless @nums;
return (join '', @nums[-2,-1]);
}
The global /g
returns all the matches in a list. The sub returns 0 if no matches are found, else returns the last two, joined into a string. Using [.,]*
instead of a lookahead.
Update (based on comments):
A slightly faster solution: Reading from the end of the string, and using the string directly instead of making a copy.
sub price {
return (join '', $_[0] =~ /(\d+)[.,](\d{2})\D*$/);
}
精彩评论