Regular expression to extract another expression from a string with delimiters
This question is a little odd, and I have spent a fair while pushing my knowledge of regular expressions to get this to the point it is at. I'm stuck at the last little bit however. The problem is as follows:
I have a string (which denotes a url in a routing system I'm modifying), that may contain a regular expression to match some segment. For example:
$pattern = "/some/path/to/</[a-z]+/>regex_var1/location";
The important bits to note here are:
- The regular expression is delimited within the string with
</
/>
(this is not especially optional unless its the end of the world for legacy reasons. I would prefer to leave this as is). - The bit of text after the
/>
(regex_var1) is a name for the match of this parameter. I need to keep this out of the expression to keep it compatible with the rest of the system, suffice to say it can be ignored in this context. - This url pattern would match
/some/path/to/another/location
What I want to achieve is to split a given format (example as above) into segments. These segments are used in a backtracking开发者_如何学Go tree traversal to match a Request URI with a controller. At present regular expressions are not supported, my intention is to allow this. In the past each segment was denoted by a /
, however I require /
characters in the contained regular expression. If I use it in it's current form the expression is split across two segments. For example
$pattern = "/some/</([a-z]+)(/optional)?/>regex2/location";
$segments = preg_split('/(?<!<)\/(?!>)/', $pattern);
yields 4 segments
// print_r($segments)
Array
(
[0] =>
[1] => some
[2] => </([a-z]+)(
[3] => optional)?/>regex2
[4] => location
)
when I actually only want 3
// print_r($segments)
Array
(
[0] =>
[1] => some
[2] => </([a-z]+)(/optional)?/>regex2
[3] => location
)
I am not interested in matching the whole URL with a regular expression, which would defeat the whole point of the exercise. This problem might seem unwarranted in isolation, but details about why I am after this specific implementation are beyond the scope of the question.
Hm, I cannot see an easy way to do it with a regexp only. You might first parse out the regexes (/<\/.*?\/>[^\/]*/
), store them in an array and replace them by something easy yet non-clashing ($1
), then run your regex and reinsert the regexes.
Another way to do it:
$str = "/some/</([a-z]+)(/optional)?/>regex2/location";
$out_segments = array();
$in_regex = false;
foreach(preg_split('+/+', $str) as $segment) {
if ($in_regex) {
if (substr($segment, 0, 1) === '>') {
$in_regex = false;
}
$out_segments[count($out_segments) - 1] .= "/$segment";
continue;
}
if (!$in_regex && substr($segment, -1, 1) === '<') {
$segment = substr($segment, 0, -1);
if ($segment !== '') {
$out_segments[] = $segment;
}
$in_regex = true;
$segment = '<';
}
if ($segment !== '') {
$out_segments[] = $segment;
}
}
var_dump($out_segments);
Edit: The wrong pseudocode looked much easier. The idea is not that bad, though.
You could try splitting the string into its components first, and then processing it afterwards:
$url = '/some/location/</([a-z]+)(/optional)?/>regex2/here/or/there';
$reg = '#(.*?)(</.*?/>.*?(?=/|$))(.*)?#';
if( preg_match($reg, $url, $matches) ) {
$result = array_merge(
preg_split( '#/#', $matches[1], 0, PREG_SPLIT_NO_EMPTY),
array( $matches[2] ),
preg_split( '#/#', $matches[3], 0, PREG_SPLIT_NO_EMPTY)
);
print_r( $result );
}
Array
(
[0] => some
[1] => location
[2] => </([a-z]+)(/optional)?/>regex2
[3] => here
[4] => or
[5] => there
)
The regex should always be in $matches[2]
, so you can find it, no matter where it occurs in the URL.
精彩评论