开发者

The URL Security

I have made below function for the security of URLs. I just wanted to know is there anything i need to re-consider or change in below code. I have made this function after reading quite some articles on security from various sources.

Here is the function:

// filters possible malacious stuff from URLs
private function filter_url($url)
{
  if (is_array($url))
  {
    foreach($url as $key => $value)
    {
        // recurssion
        $url[$key] = filter_url($value);
    }

    return $url;
}
else
{
    // Allow only one ? in URLs
    $total_question_marks = substr_count($url, '?');

    if ($total_question_marks >= 2)
    {
        exit('You can not use 2 question marks (?) in URLs for security reasons!!');
    }

    // decode URLs
    $url = rawu开发者_StackOverflow中文版rldecode($url);
    $url = urldecode($url);
    // remove bad stuff
    $url = str_replace('../', '', $url);
    $url = str_replace('..\\', '', $url);
    $url = str_replace('..%5C', '', $url);
    $url = str_replace('%00', '', $url);
    $url = str_ireplace('http', '', $url);
    $url = str_ireplace('https', '', $url);
    $url = str_ireplace('ftp', '', $url);
    $url = str_ireplace('smb', '', $url);
    $url = str_replace('://', '', $url);
    $url = str_replace(':\\\\', '', $url);
    $url = str_replace(array('<', '>'), array('&lt;', '&gt;'), $url);

    // Allow only a-zA-Z0-9_/.-?=&
    $url = preg_replace("/[^a-zA-Z0-9_\-\/\.\?=&]+/", "", $url);

    //print $url;
    return $url;
  }
}

I can use this function simply like this:

$_GET = filter_url($_GET);

Or even like this:

$_SERVER['QUERY_STRING'] = filter_url($_SERVER['QUERY_STRING']);


Any attempt to try and create some sort of catch-all filter like this will always fail and in addition, since you always "corrupt" the data, you will be in trouble when you really need to accept a piece of data with a "dissalowed" character or character sequence.

You really need to read around the subject of web security a bit and fully understand common attacks such as (a minimum of) cross-site scripting, cross-site request forgery and sql injection.

You need to take a 2-pronged approach to using user-provided data in a safe way. This is to

Think of the process like this:

  1. Validate and reject data on the way in.
  2. Encode data on the way out

Input Validation Check each piece of input to ensure it only contains the right kind of data and fit within length and range boundaries -- ACCORDING TO THE MEANING OF EACH INIDIVIDUAL PIECE OF DATA --. i.e. ensure numbers only contain numeric digits. ensure years are within a sensible range, ensure strings are not overlong or blank, ensure filenames don't travese directories, ensure IDs only contain legal characters. etc.etc. etc.. The most important thing here, is wherever possible state what is allowed; DO NOT STATE WHAT IS DISSALOWED. Testing for what is allowed and rejecting everything else is known as whitelisting and is a good thing, as you know you will get clean data (or as near to it as is sensible). Looking for bad patterns and rejecting them is known as blacklisting and is a less safe idea. For black-listing to succeed, you need to ensure that your black-list is complete, and often this is basically an impossible task. In some limited contexts a black-list approach can be ueful, but only when you are as certain as you can be that the list is exhaustive.

Storing the data Once you have only accepted clean data, save it in your variable or session. Maybe take an approach to variable-naming that indicates this data is now clean. The most important thing here, is that when we save this data, we have not yet changed it. This means that the data can be used in any context without us having "thrown anything away"

Output encoding When you send the data to an external system - such as a filename, cookie, web page, file or saving in your own database - you must encode the data to ensure you do not break the language syntax or file format used in the output. It is here where the data can be transformed.

The transformation you need to perform on the data will be different according to how and where you use the data. Transforming a string for use in a windows filename is different for a linux filename, different again if being inserted into a PDF document, or web page or database etc. etc. etc. However, lets look at 2 examples in more detail:

Output to HTML In the most common case of inserting a string into some HTML, you need to ensure that you do not allow the user to inject arbitrary content into the page, Not only does this allow them to edit the page that another user can see, but they can also inject code in the form of javascript that could do anything they want. This script will run as the user that views the page and could allow the attacker to steal their information and login credentials. This is called Cross-Site Scripting. The syntax rules of HTML and Javascript mean that you need to encode differently depending on where in the HTML you insert the user-data. There is a very valuable page at http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet that explains how to transform the data for the 6 different categories of places where you can insert a string into a HTML page.

Output to database If you save user-data to the database, you effectively have to include the string in an SQL statement. You must ensure that you do not allow the user to change the meaning of the SQL statement and only be able to change the data values. If they can change the meaning of the statement, this is called SQL injection.

This is a special case, as although you can it solve the problem using output-encoding, you are better off using a technique called "bound parameters". This ensures that your data is always used as data and never as code when talking to the database. PHP supports bound parameters in a number of db libraries including "PDO" (cross-database) and "Mysqli" (MySQL). It should be noted that the "Mysql" library does not support bound-parameters.

There is much more information all over the net and in StackOverflow about Cross-Site scripting (XSS) and and SQL injection (SQLi) and it is well worth reading around the subject. There are of course many other types of attack, but if you follow the process above you should minimise your risks. It is not unreasonable for data-validation and encoding routines to make up a significant part of a secure web appplication. But you have to biuld the security methodology into your standard working process. Adding it as an afterthought is much more difficult. Sometimes look back at your validation code for a particular function and think whether you could add some more rules. There's always going to be something you miss first time round.


Filters relying on blacklists fail. Check PHPIDS if you are serious about detecting attack patterns.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜