How can I validate a culture code with a regular expression?

2023-01-20 06:12 问答作者：

I really don't understand regex and I also can't find开发者_运维技巧 any regex rule to validate culture codes as: en-GB, en-UK, az-AZ-Cyrl, others.

How can I validate these codes with a regular expression?

You can validate with this :

/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/

Here is how it works

^       <- Starts with
[a-z]   <- From a to z (lower-case)
{2,3}   <- Repeated at least 2 times, at most 3
(?:     <- Non capturing group
   -        <- The "-" character
   [A-Z]     <- From a to z (upper-case)
   {2,3}     <- Repeated at least 2 times, at most 3
   (?:       <- Non capturing group
       -         <- The "-" character
       [a-zA-Z]  <- from a to Z (case insensitive)
       {4}      <- Repeated 4 times
   )         <- End of the group
   ?         <- Facultative
 )       <- End of the group
 ?       <- Facultative
 $       <- Ends here

You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))? if the only options are Cyrl and Latn

This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema

  <xs:simpleType name="language" id="language"> 
    <xs:annotation> 
      <xs:documentation 
        source="http://www.w3.org/TR/xmlschema-2/#language"/> 
    </xs:annotation> 
    <xs:restriction base="xs:token"> 
      <xs:pattern 
        value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
                id="language.pattern"> 
        <xs:annotation> 
          <xs:documentation 
                source="http://www.ietf.org/rfc/rfc3066.txt"> 
            pattern specifies the content of section 2.12 of XML 1.0e2
            and RFC 3066 (Revised version of RFC 1766).
          </xs:documentation> 
        </xs:annotation> 
      </xs:pattern> 
    </xs:restriction> 
  </xs:simpleType>

Then the pattern is :

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*

According https://en.wikipedia.org/wiki/IETF_language_tag the regexp can be:

/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/

From wiki:

a single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;

an optional script subtag, based on a four-letter script code from ISO 15924 (usually written in title case);

an optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;

According to

RFC 3066

2. The Language tag

2.1 Language tag syntax

The language tag is composed of one or more parts: A primary language subtag and a (possibly empty) series of subsequent subtags.

The syntax of this tag in ABNF [RFC 2234] is:
Language-Tag = Primary-subtag *( "-" Subtag )

Primary-subtag = 1*8ALPHA

Subtag = 1*8(ALPHA / DIGIT)
The productions ALPHA and DIGIT are imported from RFC 2234; they denote respectively the characters A to Z in upper or lower case and the digits from 0 to 9. The character "-" is HYPHEN-MINUS (ABNF: %x2D).

the Language-Tag is

/^[a-z]{1,8}(?:\-[a-z0-9]{1,8})*$/i
(A Javascript regular expression. The “i” denotes case-insensitiveness.)

^(?i:AF|AX|AL|DZ|AS|AD|AO|AI|AQ|AG|AR|AM|AW|AU|AT|AZ|BS|BH|BD|BB|BY|BE|BZ|BJ|BM|BT|BO|BQ|BA|BW|BV|BR|IO|BN|BG|BF|BI|KH|CM|CA|CV|KY|CF|TD|CL|CN|CX|CC|CO|KM|CG|CD|CK|CR|CI|HR|CU|CW|CY|CZ|DK|DJ|DM|DO|EC|EG|SV|GQ|ER|EE|ET|FK|FO|FJ|FI|FR|GF|PF|TF|GA|GM|GE|DE|GH|GI|GR|GL|GD|GP|GU|GT|GG|GN|GW|GY|HT|HM|VA|HN|HK|HU|IS|IN|ID|IR|IQ|IE|IM|IL|IT|JM|JP|JE|JO|KZ|KE|KI|KP|KR|KW|KG|LA|LV|LB|LS|LR|LY|LI|LT|LU|MO|MK|MG|MW|MY|MV|ML|MT|MH|MQ|MR|MU|YT|MX|FM|MD|MC|MN|ME|MS|MA|MZ|MM|NA|NR|NP|NL|NC|NZ|NI|NE|NG|NU|NF|MP|NO|OM|PK|PW|PS|PA|PG|PY|PE|PH|PN|PL|PT|PR|QA|RE|RO|RU|RW|BL|SH|KN|LC|MF|PM|VC|WS|SM|ST|SA|SN|RS|SC|SL|SG|SX|SK|SI|SB|SO|ZA|GS|SS|ES|LK|SD|SR|SJ|SZ|SE|CH|SY|TW|TJ|TZ|TH|TL|TG|TK|TO|TT|TN|TR|TM|TC|TV|UG|UA|AE|GB|US|UM|UY|UZ|VU|VE|VN|VG|VI|WF|EH|YE|ZM|ZW)$

继续阅读：regex

How can I validate a culture code with a regular expression?

RFC 3066

2. The Language tag

2.1 Language tag syntax

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？