Why doesn't this regular expression match?
I have a Perl script from Squid web proxy:
#!/usr/bin/perl
$|=1;
while (<>) {
@X = split;
$x = $X[0];
$_ = $X[1];
if (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(itag=22).*?\&(id=[a-zA-Z0-9]*)/) {
print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "&" . $3 . "\n";
# youtube Normal screen always HD itag 35, Normal screen never HD itag 34, itag=18 <--normal?
} elsif (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(itag=[0-9]*).*?\&(id=[a-zA-Z0-9]*)/) {
print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "&" . $3 . "\n";
} else {
print $x . $_ . "\n";
}
}
that I got from http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube. I've tested input such as
http://v24.lscache6.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Calgorithm%2Cburst%2Cfactor%2Coc%3AU0hPRVFUTl9FSkNOOV9JTlJF&fexp=905230%2C901013&algorithm=throttle-factor&itag=34&ipbits=0&burst=40&sver=3&signature=2A5088FD4F64CF9D58A5B798E14452D71B51BAE8.2EABF06D09C8C81650266C5464CF1D0B4D6C25CC&expire=1300190400&key=yt1&ip=0.0.0.0&fac开发者_C百科tor=1.25&id=e838f2cd3549e3cb
in RegexBuddy with Perl syntax, and I see it match the second regular expression in above script. But it didn't match when I ran the script. I'm not a Perl programmer, so where was I wrong?
I would recommend to divide the regex in separate variabales then modify one of them at a time. This way you can find the problem yourself.
I am not sure if someone will bother to debug your programm. Example:
my $part1 =qr/http:\/\/([0-9.]{4}/;
my $part2 = qr/.*\.youtube\.com/;
#etc ... then
if (m/^part1|$part2....
Why not use the URI parser module? Here is a simple example using one. That way you can grab the host out by a simple $uri->host()
and check it against your list of hosts. You should also be able to get the itag
and id
fields too regardless of what order they're in, or if there are other attributes as well, which could break a regex.
精彩评论