开发者

How can I validate a culture code with a regular expression?

I really don't understand regex and I also can't find开发者_运维技巧 any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others.

How can I validate these codes with a regular expression?


You can validate with this :

/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/

Here is how it works

^       <- Starts with
[a-z]   <- From a to z (lower-case)
{2,3}   <- Repeated at least 2 times, at most 3
(?:     <- Non capturing group
   -        <- The "-" character
   [A-Z]     <- From a to z (upper-case)
   {2,3}     <- Repeated at least 2 times, at most 3
   (?:       <- Non capturing group
       -         <- The "-" character
       [a-zA-Z]  <- from a to Z (case insensitive)
       {4}      <- Repeated 4 times
   )         <- End of the group
   ?         <- Facultative
 )       <- End of the group
 ?       <- Facultative
 $       <- Ends here

You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))? if the only options are Cyrl and Latn


This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema

  <xs:simpleType name="language" id="language"> 
    <xs:annotation> 
      <xs:documentation 
        source="http://www.w3.org/TR/xmlschema-2/#language"/> 
    </xs:annotation> 
    <xs:restriction base="xs:token"> 
      <xs:pattern 
        value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
                id="language.pattern"> 
        <xs:annotation> 
          <xs:documentation 
                source="http://www.ietf.org/rfc/rfc3066.txt"> 
            pattern specifies the content of section 2.12 of XML 1.0e2
            and RFC 3066 (Revised version of RFC 1766).
          </xs:documentation> 
        </xs:annotation> 
      </xs:pattern> 
    </xs:restriction> 
  </xs:simpleType>

Then the pattern is :

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*


According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:

/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/

From wiki:

a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;

an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);

an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;


According to

RFC 3066

2. The Language tag

2.1 Language tag syntax

The language tag is composed of one or more parts: A primary language subtag and a (possibly empty) series of subsequent subtags.

The syntax of this tag in ABNF [RFC 2234] is:

Language-Tag = Primary-subtag *( "-" Subtag )

Primary-subtag = 1*8ALPHA

Subtag = 1*8(ALPHA / DIGIT)

The productions ALPHA and DIGIT are imported from RFC 2234; they denote respectively the characters A to Z in upper or lower case and the digits from 0 to 9. The character "-" is HYPHEN-MINUS (ABNF: %x2D).

the Language-Tag is

/^[a-z]{1,8}(?:\-[a-z0-9]{1,8})*$/i
(A Javascript regular expression. The “i” denotes case-insensitiveness.)


^(?i:AF|AX|AL|DZ|AS|AD|AO|AI|AQ|AG|AR|AM|AW|AU|AT|AZ|BS|BH|BD|BB|BY|BE|BZ|BJ|BM|BT|BO|BQ|BA|BW|BV|BR|IO|BN|BG|BF|BI|KH|CM|CA|CV|KY|CF|TD|CL|CN|CX|CC|CO|KM|CG|CD|CK|CR|CI|HR|CU|CW|CY|CZ|DK|DJ|DM|DO|EC|EG|SV|GQ|ER|EE|ET|FK|FO|FJ|FI|FR|GF|PF|TF|GA|GM|GE|DE|GH|GI|GR|GL|GD|GP|GU|GT|GG|GN|GW|GY|HT|HM|VA|HN|HK|HU|IS|IN|ID|IR|IQ|IE|IM|IL|IT|JM|JP|JE|JO|KZ|KE|KI|KP|KR|KW|KG|LA|LV|LB|LS|LR|LY|LI|LT|LU|MO|MK|MG|MW|MY|MV|ML|MT|MH|MQ|MR|MU|YT|MX|FM|MD|MC|MN|ME|MS|MA|MZ|MM|NA|NR|NP|NL|NC|NZ|NI|NE|NG|NU|NF|MP|NO|OM|PK|PW|PS|PA|PG|PY|PE|PH|PN|PL|PT|PR|QA|RE|RO|RU|RW|BL|SH|KN|LC|MF|PM|VC|WS|SM|ST|SA|SN|RS|SC|SL|SG|SX|SK|SI|SB|SO|ZA|GS|SS|ES|LK|SD|SR|SJ|SZ|SE|CH|SY|TW|TJ|TZ|TH|TL|TG|TK|TO|TT|TN|TR|TM|TC|TV|UG|UA|AE|GB|US|UM|UY|UZ|VU|VE|VN|VG|VI|WF|EH|YE|ZM|ZW)$

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜