Expert opinion on string validation?
I could have asked these 3 separately, but decided to merge them.
I would like to ask for some expert opinion with examples on:
- how to properly validate a alphanumeric string? (only latin letters & numbers) 
- how to properly va开发者_如何学JAVAlidate a written unicode string? (like the above but any country letters allowed) 
- how to properly validate that a string looks like a email? I'm guessing best is - filter_var($string,FILTER_VALIDATE_EMAIL)(I guess it's the same for url and ip)
Thank you.
For #1, use ctype_alnum(). It's faster than regex, and you don't have to worry about if you got the regex right. I also think it's much neater.
- preg_match('/[a-zA-Z0-9]+/', $str)
- Something with this I'd think
filter_very neat and efficient for special purposes, but also limited.
you as well get only a filtered return string that you have to compare against the original string to see whether it fits.
there may be certain requirements and/or structures beside the allowed characters that you cannot check against in this way.
the most common way is to use pcre functions and especially preg_match. its very efficient as well and you can directly work with the return value.
and you have the whole possibilities of regular expressions. image for example you want to validate for every occouring name to be in the exacmt form "Mr/Mrs Firstname Lastname, akademic-title".
when it gets tricky is if you only want to allow certain ranges of unicode characters.
for example if you only want to allow U+0600–U+06FF (1536–1791) (arabic). plus a certain range of dingbats and brackets or something.
there are no pre defined character classes for that and defining them would be not so ellegant.
in this case the best way really would be looping over the text character by character and checking for ranges...
The best email validation I have seen so far is (note: it also checks the email domain):
/**
 * Validates an email address to RFC 3696 specification.
 * @source http://www.linuxjournal.com/article/9585
 * @param string $email_address Email address (raw input)
 * @return <type> Returns true if the email address has the email address
 *      format and the domain exists.
 */
public static function email($email_address) {
    if (empty($email_address)) return $email_address;
    $is_valid = true;
    $atIndex = strrpos($email_address, "@");
    if (is_bool($atIndex) && !$atIndex) {
        throw new VerificationException('The email address ('.$email_address.') does not contain an @ symbol');
        $is_valid = false;
    }
    else {
        $domain = substr($email_address, $atIndex+1);
        $local = substr($email_address, 0, $atIndex);
        $local_length = strlen($local);
        $domain_length = strlen($domain);
        if ($local_length < 1 || $local_length > 64) {
            // Local part length exceeded
            throw new VerificationException('The email address ('.$email_address.') local part exceeds maximum length');
        } else if ($domain_length < 1) {
            // Domain missing
            throw new VerificationException('The email address ('.$email_address.') is mising the domain part');
        } else if ($domain_length > 255) {
            // Domain part length exceeded
            throw new VerificationException('The email address ('.$email_address.') domain exceeds maximum length');
        } else if ($local[0] == '.' || $local[$local_length-1] == '.') {
            // Local part starts or ends with '.'
            throw new VerificationException('The email address ('.$email_address.') local part can not end with a dot (.)');
        } else if (preg_match('/\\.\\./', $local)) {
            // Local part has two consecutive dots
            throw new VerificationException('The email address ('.$email_address.') local part can not contain two consecutive dots (..)');
        } else if (!preg_match('/^[A-Za-z0-9\\-\\.]+$/', $domain)) {
            // Character not valid in domain part
            throw new VerificationException('The email address ('.$email_address.') domain contains invalid characters');
        } else if (preg_match('/\\.\\./', $domain)) {
            // Domain part has two consecutive dots
            throw new VerificationException('The email address ('.$email_address.') domain can not contain two consecutive dots (..)');
        } else if (!preg_match('/^(\\\\.|[A-Za-z0-9!#%&`_=\\/$\'*+?^{}|~.-])+$/', str_replace("\\\\","",$local))) {
            // Character not valid in local part unless
            // Local part is quoted
            if (!preg_match('/^"(\\\\"|[^"])+"$/',
            str_replace("\\\\","",$local))) {
                throw new VerificationException('The email address ('.$email_address.') contains invalid (non excaped) characters');
            }
        }
        if ($is_valid && !(checkdnsrr($domain, 'MX') || checkdnsrr($domain, 'A'))) {
            // Domain not found in DNS
            throw new VerificationException('The email address ('.$email_address.') domain could not be found with a DNS lookup');
        }
    }
    return $email_address;
}
Here's one that should work for the email validation.
Following are the requirements for an e-mail address, with relevant references:
- An e-mail address consists of local part and domain separated by an at sign (@) character (RFC 2822 3.4.1).
- The local part may consist of alphabetic and numeric characters, and the following characters: !, #, $, %, &, ', *, +, -, /, =, ?, ^, _, `, {, |, } and ~, possibly with dot separators (.), inside, but not at the start, end or next to another dot separator (RFC 2822 3.2.4).
- The local part may consist of a quoted string—that is, anything within quotes ("), including spaces (RFC 2822 3.2.5).
- Quoted pairs (such as \@) are valid components of a local part, though an obsolete form from RFC 822 (RFC 2822 4.4).
- The maximum length of a local part is 64 characters (RFC 2821 4.5.3.1).
- A domain consists of labels separated by dot separators (RFC1035 2.3.1).
- Domain labels start with an alphabetic character followed by zero or more alphabetic characters, numeric characters or the hyphen (-), ending with an alphabetic or numeric character (RFC 1035 2.3.1).
- The maximum length of a label is 63 characters (RFC 1035 2.3.1).
- The maximum length of a domain is 255 characters (RFC 2821 4.5.3.1).
- The domain must be fully qualified and resolvable to a type A or type MX DNS address record (RFC 2821 3.6).
 
     /**
Validate an email address.
Provide email address (raw input)
Returns true if the email address has the email 
address format and the domain exists.
*/
function validEmail($email)
{
   $isValid = true;
   $atIndex = strrpos($email, "@");
   if (is_bool($atIndex) && !$atIndex)
   {
      $isValid = false;
   }
   else
   {
      $domain = substr($email, $atIndex+1);
      $local = substr($email, 0, $atIndex);
      $localLen = strlen($local);
      $domainLen = strlen($domain);
      if ($localLen < 1 || $localLen > 64)
      {
         // local part length exceeded
         $isValid = false;
      }
      else if ($domainLen < 1 || $domainLen > 255)
      {
         // domain part length exceeded
         $isValid = false;
      }
      else if ($local[0] == '.' || $local[$localLen-1] == '.')
      {
         // local part starts or ends with '.'
         $isValid = false;
      }
      else if (preg_match('/\\.\\./', $local))
      {
         // local part has two consecutive dots
         $isValid = false;
      }
      else if (!preg_match('/^[A-Za-z0-9\\-\\.]+$/', $domain))
      {
         // character not valid in domain part
         $isValid = false;
      }
      else if (preg_match('/\\.\\./', $domain))
      {
         // domain part has two consecutive dots
         $isValid = false;
      }
      else if
(!preg_match('/^(\\\\.|[A-Za-z0-9!#%&`_=\\/$\'*+?^{}|~.-])+$/',
                 str_replace("\\\\","",$local)))
      {
         // character not valid in local part unless 
         // local part is quoted
         if (!preg_match('/^"(\\\\"|[^"])+"$/',
             str_replace("\\\\","",$local)))
         {
            $isValid = false;
         }
      }
      if ($isValid && !(checkdnsrr($domain,"MX") || 
 ?checkdnsrr($domain,"A")))
      {
         // domain not found in DNS
         $isValid = false;
      }
   }
   return $isValid;
}
Source: Douglas Lovell
You probably want to use regular expressions.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论