开发者

Regular expression to extract another expression from a string with delimiters

This question is a little odd, and I have spent a fair while pushing my knowledge of regular expressions to get this to the point it is at. I'm stuck at the last little bit however. The problem is as follows:

I have a string (which denotes a url in a routing system I'm modifying), that may contain a regular expression to match some segment. For example:

$pattern = "/some/path/to/</[a-z]+/>regex_var1/location";

The important bits to note here are:

  • The regular expression is delimited within the string with </ /> (this is not especially optional unless its the end of the world for legacy reasons. I would prefer to leave this as is).
  • The bit of text after the /> (regex_var1) is a name for the match of this parameter. I need to keep this out of the expression to keep it compatible with the rest of the system, suffice to say it can be ignored in this context.
  • This url pattern would match /some/path/to/another/location

What I want to achieve is to split a given format (example as above) into segments. These segments are used in a backtracking开发者_如何学Go tree traversal to match a Request URI with a controller. At present regular expressions are not supported, my intention is to allow this. In the past each segment was denoted by a /, however I require / characters in the contained regular expression. If I use it in it's current form the expression is split across two segments. For example

$pattern = "/some/</([a-z]+)(/optional)?/>regex2/location";
$segments = preg_split('/(?<!<)\/(?!>)/', $pattern);

yields 4 segments

// print_r($segments)
Array
(
    [0] => 
    [1] => some
    [2] => </([a-z]+)(
    [3] => optional)?/>regex2
    [4] => location
)

when I actually only want 3

// print_r($segments)
Array
(
    [0] => 
    [1] => some
    [2] => </([a-z]+)(/optional)?/>regex2
    [3] => location
)

I am not interested in matching the whole URL with a regular expression, which would defeat the whole point of the exercise. This problem might seem unwarranted in isolation, but details about why I am after this specific implementation are beyond the scope of the question.


Hm, I cannot see an easy way to do it with a regexp only. You might first parse out the regexes (/<\/.*?\/>[^\/]*/), store them in an array and replace them by something easy yet non-clashing ($1), then run your regex and reinsert the regexes.


Another way to do it:

$str = "/some/</([a-z]+)(/optional)?/>regex2/location";
$out_segments = array();
$in_regex = false;
foreach(preg_split('+/+', $str) as $segment) {
    if ($in_regex) {
        if (substr($segment, 0, 1) === '>') {
            $in_regex = false;
        }
        $out_segments[count($out_segments) - 1] .= "/$segment";
        continue;
    }
    if (!$in_regex && substr($segment, -1, 1) === '<') {
        $segment = substr($segment, 0, -1);
        if ($segment !== '') {
            $out_segments[] = $segment;
        }
        $in_regex = true;
        $segment = '<';
    }
    if ($segment !== '') {
        $out_segments[] = $segment;
    }
}
var_dump($out_segments);

Edit: The wrong pseudocode looked much easier. The idea is not that bad, though.


You could try splitting the string into its components first, and then processing it afterwards:

$url = '/some/location/</([a-z]+)(/optional)?/>regex2/here/or/there';
$reg = '#(.*?)(</.*?/>.*?(?=/|$))(.*)?#';
if( preg_match($reg, $url, $matches) ) {
    $result = array_merge(
        preg_split( '#/#', $matches[1], 0, PREG_SPLIT_NO_EMPTY),
        array( $matches[2] ),
        preg_split( '#/#', $matches[3], 0, PREG_SPLIT_NO_EMPTY)
    );
    print_r( $result );    
}

Array
(
    [0] => some
    [1] => location
    [2] => </([a-z]+)(/optional)?/>regex2
    [3] => here
    [4] => or
    [5] => there
)

The regex should always be in $matches[2], so you can find it, no matter where it occurs in the URL.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜