Erroneous Matches with Regular Expression
$regexp = '/(?:<input\stype="hidden"\sname="){1}([a-zA-Z0-9]*)(?:"\svalue="1"\s\/>)/';
$response = '<input type="hidden" name="7d37dddd0eb2c85b8d394ef36b35f54f" value="1" />';
preg_match($regexp, $response, $matches);
echo $matches[1]; // Outputs: 7d37dddd0eb2c85b8d394ef36b35f54f
So I'm using this regular expression to search for an authentication token on a webpage implementing Joomla in order开发者_高级运维 to preform a scripted login.
I've got all this working but am wondering what is wrong with my regular expression as it always returns 2 items.
Array ( [0] => [1] => 7d37dddd0eb2c85b8d394ef36b35f54f)
Also the name of the input I'm checking for changes every page load both in length and name.
Nothing is wrong. Item [0] always contains the entire match. From the docs (emphasis mine):
If
matches
is provided, then it is filled with the results of search.$matches[0]
will contain the text that matched the full pattern,$matches[1]
will have the text that matched the first captured parenthesized subpattern, and so on.
Your regex (overlooking the fact that you are working on HTML with regexes in the first place, which you know you shouldn't) is a bit too complicated.
$regexp = '#<input\s+type="hidden"\s+name="([0-9a-f]*)"\s+value="1"\s*/>#i'
- You don't need the non-capturing groups at all.
- You use
\s
, which limits you to a single character.\s+
is probably better. - Using something different than
/
as the regex boundary makes escaping of forward slashes in the regex unnecessary. - Making the regex case-insensitive could be useful, too.
- The auth token looks like a hex string, so matching
a-z
is unnecessary.
As per the manual entry for preg_match:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
精彩评论