How can I validate a culture code with a regular expression?
I really don't understand regex and I also can't find开发者_运维技巧 any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others.
How can I validate these codes with a regular expression?
You can validate with this :
/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/
Here is how it works
^ <- Starts with
[a-z] <- From a to z (lower-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[A-Z] <- From a to z (upper-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[a-zA-Z] <- from a to Z (case insensitive)
{4} <- Repeated 4 times
) <- End of the group
? <- Facultative
) <- End of the group
? <- Facultative
$ <- Ends here
You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))?
if the only options are Cyrl and Latn
This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema
<xs:simpleType name="language" id="language">
<xs:annotation>
<xs:documentation
source="http://www.w3.org/TR/xmlschema-2/#language"/>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern
value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
id="language.pattern">
<xs:annotation>
<xs:documentation
source="http://www.ietf.org/rfc/rfc3066.txt">
pattern specifies the content of section 2.12 of XML 1.0e2
and RFC 3066 (Revised version of RFC 1766).
</xs:documentation>
</xs:annotation>
</xs:pattern>
</xs:restriction>
</xs:simpleType>
Then the pattern is :
[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:
/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/
From wiki:
a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;
an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);
an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;
According to
RFC 3066
2. The Language tag
2.1 Language tag syntax
The language tag is composed of one or more parts: A primary language subtag and a (possibly empty) series of subsequent subtags.
The syntax of this tag in ABNF [RFC 2234] is:
Language-Tag = Primary-subtag *( "-" Subtag ) Primary-subtag = 1*8ALPHA Subtag = 1*8(ALPHA / DIGIT)
The productions ALPHA and DIGIT are imported from RFC 2234; they denote respectively the characters
A
toZ
in upper or lower case and the digits from0
to9
. The character "-
" is HYPHEN-MINUS (ABNF:%x2D
).
the Language-Tag
is
/^[a-z]{1,8}(?:\-[a-z0-9]{1,8})*$/i
(A Javascript regular expression. The “i” denotes case-insensitiveness.)
^(?i:AF|AX|AL|DZ|AS|AD|AO|AI|AQ|AG|AR|AM|AW|AU|AT|AZ|BS|BH|BD|BB|BY|BE|BZ|BJ|BM|BT|BO|BQ|BA|BW|BV|BR|IO|BN|BG|BF|BI|KH|CM|CA|CV|KY|CF|TD|CL|CN|CX|CC|CO|KM|CG|CD|CK|CR|CI|HR|CU|CW|CY|CZ|DK|DJ|DM|DO|EC|EG|SV|GQ|ER|EE|ET|FK|FO|FJ|FI|FR|GF|PF|TF|GA|GM|GE|DE|GH|GI|GR|GL|GD|GP|GU|GT|GG|GN|GW|GY|HT|HM|VA|HN|HK|HU|IS|IN|ID|IR|IQ|IE|IM|IL|IT|JM|JP|JE|JO|KZ|KE|KI|KP|KR|KW|KG|LA|LV|LB|LS|LR|LY|LI|LT|LU|MO|MK|MG|MW|MY|MV|ML|MT|MH|MQ|MR|MU|YT|MX|FM|MD|MC|MN|ME|MS|MA|MZ|MM|NA|NR|NP|NL|NC|NZ|NI|NE|NG|NU|NF|MP|NO|OM|PK|PW|PS|PA|PG|PY|PE|PH|PN|PL|PT|PR|QA|RE|RO|RU|RW|BL|SH|KN|LC|MF|PM|VC|WS|SM|ST|SA|SN|RS|SC|SL|SG|SX|SK|SI|SB|SO|ZA|GS|SS|ES|LK|SD|SR|SJ|SZ|SE|CH|SY|TW|TJ|TZ|TH|TL|TG|TK|TO|TT|TN|TR|TM|TC|TV|UG|UA|AE|GB|US|UM|UY|UZ|VU|VE|VN|VG|VI|WF|EH|YE|ZM|ZW)$
精彩评论