开发者

How to handle 404's with Regex-based routing?

Please consider the following very rudimentary "controllers" (functions in this case, for simplicity):

function Index() {
    var_dump(__FUNCTION__); // show the "Index" page
}

function Send($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder开发者_Go百科 controller
}

function Receive($n) {
    var_dump(__FUNCTION__, func_get_args()); // placeholder controller
}

function Not_Found() {
    var_dump(__FUNCTION__); // show a "404 - Not Found" page
}

And the following regex-based Route() function:

function Route($route, $function = null)
{
    $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');

    if (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route), '/') . '$~i', $result, $matches) > 0)
    {
        exit(call_user_func_array($function, array_slice($matches, 1)));
    }

    return false;
}

Now I want to map the following URLs (trailing slashes are ignored) to the corresponding "controllers":

/index.php -> Index()
/index.php/send/:NUM -> Send()
/index.php/receive/:NUM -> Receive()
/index.php/NON_EXISTENT -> Not_Found()

This is the part where things start to get tricky, I've two problems I'm not able to solve... I figure I'm not the first person to have this problem, so someone out there should have the solution.


Catching 404's (Solved!)

I can't find a way to distinguish between requests to the root (index.php) and requests that shouldn't exist like (index.php/notHere). I end up serving the default index.php route for URLs that should otherwise be served a 404 - Not Found error page. How can I solve this?

EDIT - The solution just flashed in my mind:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found'); // use :any here, see the problem bellow
Route('/', 'Index');

Ordering of the Routes

If I set up the routes in a "logical" order, like this:

Route('/', 'Index');
Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route(':any', 'Not_Found');

All URL requests are catched by the Index() controller, since the empty regex (remember: trailing slashes are ignored) matches everything. However, if I define the routes in a "hacky" order, like this:

Route('/send/(:num)', 'Send');
Route('/receive/(:num)', 'Receive');
Route('/:any', 'Not_Found');
Route('/', 'Index');

Everything seems to work like it should. Is there an elegant way of solving this problem?

The routes may not always be hard-coded (pulled from a DB or something), and I need to make sure that it won't be ignoring any routes due to the order they were defined. Any help is appreciated!


Okay, I know there's more than one way to skin a cat, but why in the world would you do it this way? Seems like some RoR approach to something that could be easily handled with mod_rewrite

That being said, I rewrote your Route function and was able to accomplish your goal. Keep in mind I added another conditional to catch the Index directly as you were stripping out all the /'s and that's why it was matching the Index when you wanted it to match the 404. I also consolidated the 4 Route() calls to use a foreach().

function Route()
{
        $result = rtrim(preg_replace('~/+~', '/', substr($_SERVER['PHP_SELF'], strlen($_SERVER['SCRIPT_NAME']))), '/');
        $matches = array();

        $routes = array(
                'Send'      => '/send/(:num)',
                'Receive'   => '/receive/(:num)',
                'Index'     => '/',
                'Not_Found' => null
        );

        foreach ($routes as $function => $route)
        {
                if (($route == '/' && $result == '')
                        || (preg_match('~' . rtrim(str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $route)) . '$~i', $result, $matches) > 0))
                {
                        exit(call_user_func_array($function, array_slice($matches, 1)));
                }
        }

        return false;
}

Route();

Cheers!


This is a common problem with MVC webapps, that is often solved before it becomes a problem at all.

The easiest and most general way is to use exceptions. Throw a PageNotFound exception if you don't have a content for given parameters. At the top level off your application, catch all exceptions like in this simplified example:

index.php:

try {
    $controller->method($arg);
} catch (PageNotFound $e) {
    show404Page($e->getMessage());
} catch (Exception $e) {
    logFatalError($e->getMessage());
    show500Page();
}

controller.php:

function method($arg) {
    $obj = findByID($arg);
    if (false === $obj) {
         throw new PageNotFound($arg);
    } else {
         ...
    }
}

The ordering problem can be solved by sorting the regexes so that the most specific regex is matched first, and the least specific is matched last. To do this, count the path separtors (ie. slashes) in the regex, excluding the path separator at the beginning. You'll get this:

 Regex           Separators
 --------------------------
 /send/(:num)    1
 /send/8/(:num)  2
 /               0

Sort them by descending order, and process. The process order is:

  1. /send/8/(:num)
  2. /send/(:num)
  3. /


OK first of all something like:

foo.com/index.php/more/info/to/follow 

is perfectly valid and as per standard should load up index.php with $_SERVER[PATH_INFO] set to /more/info/to/follow. This is CGI/1.1 standard. If you want the server to NOT perform PATH_INFO expansions then turn it off in your server settings. Under apache it is done using:

AcceptPathInfo Off

If you set it to Off under Apache2 ... It will send out a 404.

I am not sure what the IIS flag is but I think you can find it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜