开发者

How to get a regex to start from the beginning of a string

This is an oddball issue I've encountered (and probably have seen before but never paid attention to).

Here's the gist of the code:

my $url = 'http://twitter.com/' . $handle;
my $page = get($url);

if($page =~ m/Web<\/span>\s*<a href=\"(.+?)\"/gi) {
    $website = $1;
}

if($page =~ m/follower_count\" class=\"stats_count numeric\">(.+?)\s*</g) {
    $n开发者_运维技巧um_followers = $1;
}

It gets a twitter url and does a bit of regex to capture the # of followers and the website of the user. This code actually works fine. But when you switch the order and search for the website AFTER you search for follower, website comes up empty. As it turns out, when you regex a string, it seems to sort of save the location of where that last match was made. In the html, the follower count comes up after the website display. If you do the follower count regex first, it's like it starts up the website regex where the follower count left off (like an index reference to the string).

What has me baffled is that i have the "g" operator at the end, signifying "global", as in "search the string globally... from the beginning".

Am I missing something here? I can't seem to figure out why it's resuming the last regex position on the string (if that makes sense).


The /g modifier, in scalar context, doesn't do what you think it does. Get rid of it.

As perlretut explains, /g in scalar context cycles over each match in turn. It's designed for use in a loop, like so:

while ($str =~ /pattern/g) {
    # match on each occurence of 'pattern' in $str in turn
}

The other way to use /g is in list context:

my @results = $str =~ /pattern/g; # collect each occurence of 'pattern' within $str into @results

If you're using /g in scalar context and you're not iterating over it, you're almost certainly not using it right.


To quote perlop on Regexp Quote Like Operators:

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see pos. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (e.g. m//gc). Modifying the target string also resets the search position.

So in scalar context (which you're using), /g does not mean "search from the beginning", it means "search starting from the string's pos". "Search from the beginning" is the default (without /g).

/g is normally used when you want to find all matches for a regex in a string, instead of just the first match. In list context, it does that by returning a list of all the matches. In scalar context it does that by starting the search from where the previous search left off (usually done in a loop).


The gist of it is that matches done with /g save the position of the last match, so that the next time that string is matched, the regex will start from there. In scalar context, this is generally done to get multiple successive matches in a while loop; In list context, /g returns all the matched (but not overlapping) results. You can read more about this on perlretut, under Global Matching, and on perlop, under Regexp-Quote-Like-Operators.

You can see the current position with the pos function. You can also set the position by using pos as an lvalue: pos($string) = 0; will reset the position to the beginning of the string.

There isn't much reason to use /g in scalar context outside of a loop, as you can get the exact same functionality using the \G assertion.

..of course, then nobody remembers how \G works and you are back at square one, but that's another topic.


m//g does not reset the position. You need to do that manually. See this for reference: http://perldoc.perl.org/functions/pos.html

I believe you just set pos to 0 or undef and it will work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜