开发者

What is the ultimate postal code and zip regex?

I'm looking for the ultimate postal code and 开发者_C百科zip code regex. I'm looking for something that will cover most (hopefully all) of the world.


The unicode CLDR contains the postal code regex for each country. (158 regex's in total!)

  • Download core.zip from http://unicode.org/Public/cldr/26.0.1/
  • unzip core.zip
  • Take a look at common/supplemental/postalCodeData.xml from the unzipped content (direct content: common/supplemental/postalCodeData.xml)

Google also has a web service with per-country address formatting information, including postal codes, here - http://i18napis.appspot.com/address (I found that link via http://unicode.org/review/pri180/ )

Edit

Here a copy of postalCodeData.xml regex :

"GB", "GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}"
"JE", "JE\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"GG", "GY\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"IM", "IM\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}"
"US", "\d{5}([ \-]\d{4})?"
"CA", "[ABCEGHJKLMNPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ ]?\d[ABCEGHJ-NPRSTV-Z]\d"
"DE", "\d{5}"
"JP", "\d{3}-\d{4}"
"FR", "\d{2}[ ]?\d{3}"
"AU", "\d{4}"
"IT", "\d{5}"
"CH", "\d{4}"
"AT", "\d{4}"
"ES", "\d{5}"
"NL", "\d{4}[ ]?[A-Z]{2}"
"BE", "\d{4}"
"DK", "\d{4}"
"SE", "\d{3}[ ]?\d{2}"
"NO", "\d{4}"
"BR", "\d{5}[\-]?\d{3}"
"PT", "\d{4}([\-]\d{3})?"
"FI", "\d{5}"
"AX", "22\d{3}"
"KR", "\d{3}[\-]\d{3}"
"CN", "\d{6}"
"TW", "\d{3}(\d{2})?"
"SG", "\d{6}"
"DZ", "\d{5}"
"AD", "AD\d{3}"
"AR", "([A-HJ-NP-Z])?\d{4}([A-Z]{3})?"
"AM", "(37)?\d{4}"
"AZ", "\d{4}"
"BH", "((1[0-2]|[2-9])\d{2})?"
"BD", "\d{4}"
"BB", "(BB\d{5})?"
"BY", "\d{6}"
"BM", "[A-Z]{2}[ ]?[A-Z0-9]{2}"
"BA", "\d{5}"
"IO", "BBND 1ZZ"
"BN", "[A-Z]{2}[ ]?\d{4}"
"BG", "\d{4}"
"KH", "\d{5}"
"CV", "\d{4}"
"CL", "\d{7}"
"CR", "\d{4,5}|\d{3}-\d{4}"
"HR", "\d{5}"
"CY", "\d{4}"
"CZ", "\d{3}[ ]?\d{2}"
"DO", "\d{5}"
"EC", "([A-Z]\d{4}[A-Z]|(?:[A-Z]{2})?\d{6})?"
"EG", "\d{5}"
"EE", "\d{5}"
"FO", "\d{3}"
"GE", "\d{4}"
"GR", "\d{3}[ ]?\d{2}"
"GL", "39\d{2}"
"GT", "\d{5}"
"HT", "\d{4}"
"HN", "(?:\d{5})?"
"HU", "\d{4}"
"IS", "\d{3}"
"IN", "\d{6}"
"ID", "\d{5}"
"IL", "\d{5}"
"JO", "\d{5}"
"KZ", "\d{6}"
"KE", "\d{5}"
"KW", "\d{5}"
"LA", "\d{5}"
"LV", "\d{4}"
"LB", "(\d{4}([ ]?\d{4})?)?"
"LI", "(948[5-9])|(949[0-7])"
"LT", "\d{5}"
"LU", "\d{4}"
"MK", "\d{4}"
"MY", "\d{5}"
"MV", "\d{5}"
"MT", "[A-Z]{3}[ ]?\d{2,4}"
"MU", "(\d{3}[A-Z]{2}\d{3})?"
"MX", "\d{5}"
"MD", "\d{4}"
"MC", "980\d{2}"
"MA", "\d{5}"
"NP", "\d{5}"
"NZ", "\d{4}"
"NI", "((\d{4}-)?\d{3}-\d{3}(-\d{1})?)?"
"NG", "(\d{6})?"
"OM", "(PC )?\d{3}"
"PK", "\d{5}"
"PY", "\d{4}"
"PH", "\d{4}"
"PL", "\d{2}-\d{3}"
"PR", "00[679]\d{2}([ \-]\d{4})?"
"RO", "\d{6}"
"RU", "\d{6}"
"SM", "4789\d"
"SA", "\d{5}"
"SN", "\d{5}"
"SK", "\d{3}[ ]?\d{2}"
"SI", "\d{4}"
"ZA", "\d{4}"
"LK", "\d{5}"
"TJ", "\d{6}"
"TH", "\d{5}"
"TN", "\d{4}"
"TR", "\d{5}"
"TM", "\d{6}"
"UA", "\d{5}"
"UY", "\d{5}"
"UZ", "\d{6}"
"VA", "00120"
"VE", "\d{4}"
"ZM", "\d{5}"
"AS", "96799"
"CC", "6799"
"CK", "\d{4}"
"RS", "\d{6}"
"ME", "8\d{4}"
"CS", "\d{5}"
"YU", "\d{5}"
"CX", "6798"
"ET", "\d{4}"
"FK", "FIQQ 1ZZ"
"NF", "2899"
"FM", "(9694[1-4])([ \-]\d{4})?"
"GF", "9[78]3\d{2}"
"GN", "\d{3}"
"GP", "9[78][01]\d{2}"
"GS", "SIQQ 1ZZ"
"GU", "969[123]\d([ \-]\d{4})?"
"GW", "\d{4}"
"HM", "\d{4}"
"IQ", "\d{5}"
"KG", "\d{6}"
"LR", "\d{4}"
"LS", "\d{3}"
"MG", "\d{3}"
"MH", "969[67]\d([ \-]\d{4})?"
"MN", "\d{6}"
"MP", "9695[012]([ \-]\d{4})?"
"MQ", "9[78]2\d{2}"
"NC", "988\d{2}"
"NE", "\d{4}"
"VI", "008(([0-4]\d)|(5[01]))([ \-]\d{4})?"
"PF", "987\d{2}"
"PG", "\d{3}"
"PM", "9[78]5\d{2}"
"PN", "PCRN 1ZZ"
"PW", "96940"
"RE", "9[78]4\d{2}"
"SH", "(ASCN|STHL) 1ZZ"
"SJ", "\d{4}"
"SO", "\d{5}"
"SZ", "[HLMS]\d{3}"
"TC", "TKCA 1ZZ"
"WF", "986\d{2}"
"XK", "\d{5}"
"YT", "976\d{2}"


There is none.

Postal/zip codes around the world don't follow a common pattern. In some countries they are made up by numbers, in others they can be combinations of numbers an letters, some can contain spaces, others dots, the number of characters can vary from two to at least six...

What you could do (theoretically) is create a seperate regex for every country in the world, not recommendable IMO. But you would still be missing on the validation part: Zip code 12345 may exist, but 12346 not, maybe 12344 doesn't exist either. How do you check for that with a regex?

You can't.


use these regx

$ZIPREG=array(
    "US"=>"^\d{5}([\-]?\d{4})?$",
    "UK"=>"^(GIR|[A-Z]\d[A-Z\d]??|[A-Z]{2}\d[A-Z\d]??)[ ]??(\d[A-Z]{2})$",
    "DE"=>"\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b",
    "CA"=>"^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])\ {0,1}(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$",
    "FR"=>"^(F-)?((2[A|B])|[0-9]{2})[0-9]{3}$",
    "IT"=>"^(V-|I-)?[0-9]{5}$",
    "AU"=>"^(0[289][0-9]{2})|([1345689][0-9]{3})|(2[0-8][0-9]{2})|(290[0-9])|(291[0-4])|(7[0-4][0-9]{2})|(7[8-9][0-9]{2})$",
    "NL"=>"^[1-9][0-9]{3}\s?([a-zA-Z]{2})?$",
    "ES"=>"^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$",
    "DK"=>"^([D|d][K|k]( |-))?[1-9]{1}[0-9]{3}$",
    "SE"=>"^(s-|S-){0,1}[0-9]{3}\s?[0-9]{2}$",
    "BE"=>"^[1-9]{1}[0-9]{3}$",
    "IN"=>"^\d{6}$"
);


  1. Every postal code system uses only A-Z and/or 0-9 and sometimes space/dash

  2. Not every country uses postal codes (ex. Ireland outside of Dublin), but we'll ignore that here.

  3. The shortest postal code format is Sierra Leone with NN

  4. The longest is American Samoa with NNNNN-NNNNNN

  5. You should allow one space or dash.

  6. Should not begin or end with space or dash

This should cover the above:

(?i)^[a-z0-9][a-z0-9\- ]{0,10}[a-z0-9]$


Trying to cover the whole world with one regular expression is not completely possible, and certainly not feasible or recommended.

Not to toot my own horn, but I've written some pretty thorough regular expressions which you may find helpful.

  • Canadian postal codes

    Basic validation:
    ^[ABCEGHJ-NPRSTVXY]{1}[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$
    
    Extended validation:
    ^(A(0[ABCEGHJ-NPR]|1[ABCEGHK-NSV-Y]|2[ABHNV]|5[A]|8[A])|B(0[CEHJ-NPRSTVW]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGHJNRSTV-Z]|3[ABEGHJ-NPRSTVZ]|4[ABCEGHNPRV]|5[A]|6[L]|9[A])|C(0[AB]|1[ABCEN])|E(1[ABCEGHJNVWX]|2[AEGHJ-NPRSV]|3[ABCELNVYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTV]|6[ABCEGHJKL]|7[ABCEGHJ-NP]|8[ABCEGJ-NPRST]|9[ABCEGH])|G(0[ACEGHJ-NPRSTV-Z]|1[ABCEGHJ-NPRSTV-Y]|2[ABCEGJ-N]|3[ABCEGHJ-NZ]|4[ARSTVWXZ]|5[ABCHJLMNRTVXYZ]|6[ABCEGHJKLPRSTVWXZ]|7[ABGHJKNPSTXYZ]|8[ABCEGHJ-NPTVWYZ]|9[ABCHNPRTX])|H(0[HM]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRSTV-Z]|4[ABCEGHJ-NPRSTV-Z]|5[AB]|7[ABCEGHJ-NPRSTV-Y]|8[NPRSTYZ]|9[ABCEGHJKPRSWX])|J(0[ABCEGHJ-NPRSTV-Z]|1[ACEGHJ-NRSTXZ]|2[ABCEGHJ-NRSTWXY]|3[ABEGHLMNPRTVXYZ]|4[BGHJ-NPRSTV-Z]|5[ABCJ-MRTV-Z]|6[AEJKNRSTVWYXZ]|7[ABCEGHJ-NPRTV-Z]|8[ABCEGHLMNPRTVXYZ]|9[ABEHJLNTVXYZ])|K(0[ABCEGHJ-M]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-MPRSTVW]|4[ABCKMPR]|6[AHJKTV]|7[ACGHK-NPRSV]|8[ABHNPRV]|9[AHJKLV])|L(0[[ABCEGHJ-NPRS]]|1[ABCEGHJ-NPRSTV-Z]|2[AEGHJMNPRSTVW]|3[BCKMPRSTVXYZ]|4[ABCEGHJ-NPRSTV-Z]|5[ABCEGHJ-NPRSTVW]|6[ABCEGHJ-MPRSTV-Z]|7[ABCEGJ-NPRST]|8[EGHJ-NPRSTVW]|9[ABCGHK-NPRSTVWYZ])|M(1[BCEGHJ-NPRSTVWX]|2[HJ-NPR]|3[ABCHJ-N]|4[ABCEGHJ-NPRSTV-Y]|5[ABCEGHJ-NPRSTVWX]|6[ABCEGHJ-NPRS]|7[AY]|8[V-Z]|9[ABCLMNPRVW])|N(0[ABCEGHJ-NPR]|1[ACEGHKLMPRST]|2[ABCEGHJ-NPRTVZ]|3[ABCEHLPRSTVWY]|4[BGKLNSTVWXZ]|5[ACHLPRV-Z]|6[ABCEGHJ-NP]|7[AGLMSTVWX]|8[AHMNPRSTV-Y]|9[ABCEGHJKVY])|P(0[ABCEGHJ-NPRSTV-Y]|1[ABCHLP]|2[ABN]|3[ABCEGLNPY]|4[NPR]|5[AEN]|6[ABC]|7[ABCEGJKL]|8[NT]|9[AN])|R(0[ABCEGHJ-M]|1[ABN]|2[CEGHJ-NPRV-Y]|3[ABCEGHJ-NPRSTV-Y]|4[AHJKL]|5[AGH]|6[MW]|7[ABCN]|8[AN]|9[A])|S(0[ACEGHJ-NP]|2[V]|3[N]|4[AHLNPRSTV-Z]|6[HJKVWX]|7[HJ-NPRSTVW]|9[AHVX])|T(0[ABCEGHJ-MPV]|1[ABCGHJ-MPRSV-Y]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NPRZ]|4[ABCEGHJLNPRSTVX]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTVWX]|7[AENPSVXYZ]|8[ABCEGHLNRSVWX]|9[ACEGHJKMNSVWX])|V(0[ABCEGHJ-NPRSTVWX]|1[ABCEGHJ-NPRSTV-Z]|2[ABCEGHJ-NPRSTV-Z]|3[ABCEGHJ-NRSTV-Y]|4[ABCEGK-NPRSTVWXZ]|5[ABCEGHJ-NPRSTV-Z]|6[ABCEGHJ-NPRSTV-Z]|7[ABCEGHJ-NPRSTV-Y]|8[ABCGJ-NPRSTV-Z]|9[ABCEGHJ-NPRSTV-Z])|X(0[ABCGX]|1[A])|Y(0[AB]|1[A]))[ ]?[0-9]{1}[ABCEGHJ-NPRSTV-Z]{1}[0-9]{1}$
    
  • US ZIP codes

    ^[0-9]{5}(-[0-9]{4})?$
    
  • UK post codes

    ^([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW])\ [0-9][ABD-HJLNP-UW-Z]{2}|(GIR\ 0AA)|(SAN\ TA1)|(BFPO\ (C\/O\ )?[0-9]{1,4})|((ASCN|BBND|[BFS]IQQ|PCRN|STHL|TDCU|TKCA)\ 1ZZ))$
    

It is not possible to guarantee accuracy without actually mailing something to an address and having the person let you know when they receive it, but we can narrow things by down by eliminating cases that we know are bad.


We use the following:

Canada

([A-Z]{1}[0-9]{1}){3}   //We raise to upper first

America

[0-9]{5}                //-or-
[0-9]{5}-[0-9]{4}       //10 digit zip

Other

Accept as is


Depending on your application, you might want to implement regex matching for the countries where most of your visitors originate and no validation for the rest (accept anything).


Please note that this is quite a hard problem, as stated by the accepted answer. I guess it didn't deter the folks at geonames.org though. They have a file a country info file, which doesn't fit whole into this answer - limit is at 30000 chars apparently. There are regexes for about 150 countries.

I extracted the bits relevant to this question here :

AD ^(?:AD)*(\d{3})$
AM ^(\d{6})$
AR ^([A-Z]\d{4}[A-Z]{3})$
AT ^(\d{4})$
AU ^(\d{4})$
AX ^(?:FI)*(\d{5})$
AZ ^(?:AZ)*(\d{4})$
BA ^(\d{5})$
BB ^(?:BB)*(\d{5})$
BD ^(\d{4})$
BE ^(\d{4})$
BG ^(\d{4})$
BH ^(\d{3}\d?)$
BM ^([A-Z]{2}\d{2})$
BN ^([A-Z]{2}\d{4})$
BR ^(\d{8})$
BY ^(\d{6})$
CA ^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ]) ?(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$
CH ^(\d{4})$
CL ^(\d{7})$
CN ^(\d{6})$
CR ^(\d{4})$
CU ^(?:CP)*(\d{5})$
CV ^(\d{4})$
CX ^(\d{4})$
CY ^(\d{4})$
CZ ^(\d{5})$
DE ^(\d{5})$
DK ^(\d{4})$
DO ^(\d{5})$
DZ ^(\d{5})$
EC ^([a-zA-Z]\d{4}[a-zA-Z])$
EE ^(\d{5})$
EG ^(\d{5})$
ES ^(\d{5})$
ET ^(\d{4})$
FI ^(?:FI)*(\d{5})$
FM ^(\d{5})$
FO ^(?:FO)*(\d{3})$
FR ^(\d{5})$
GB ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
GE ^(\d{4})$
GF ^((97|98)3\d{2})$
GG ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
GL ^(\d{4})$
GP ^((97|98)\d{3})$
GR ^(\d{5})$
GT ^(\d{5})$
GU ^(969\d{2})$
GW ^(\d{4})$
HN ^([A-Z]{2}\d{4})$
HR ^(?:HR)*(\d{5})$
HT ^(?:HT)*(\d{4})$
HU ^(\d{4})$
ID ^(\d{5})$
IL ^(\d{5})$
IM ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
IN ^(\d{6})$
IQ ^(\d{5})$
IR ^(\d{10})$
IS ^(\d{3})$
IT ^(\d{5})$
JE ^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2}[A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2})|([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$
JO ^(\d{5})$
JP ^(\d{7})$
KE ^(\d{5})$
KG ^(\d{6})$
KH ^(\d{5})$
KP ^(\d{6})$
KR ^(?:SEOUL)*(\d{6})$
KW ^(\d{5})$
KZ ^(\d{6})$
LA ^(\d{5})$
LB ^(\d{4}(\d{4})?)$
LI ^(\d{4})$
LK ^(\d{5})$
LR ^(\d{4})$
LS ^(\d{3})$
LT ^(?:LT)*(\d{5})$
LU ^(\d{4})$
LV ^(?:LV)*(\d{4})$
MA ^(\d{5})$
MC ^(\d{5})$
MD ^(?:MD)*(\d{4})$
ME ^(\d{5})$
MG ^(\d{3})$
MK ^(\d{4})$
MM ^(\d{5})$
MN ^(\d{6})$
MQ ^(\d{5})$
MT ^([A-Z]{3}\d{2}\d?)$
MV ^(\d{5})$
MX ^(\d{5})$
MY ^(\d{5})$
MZ ^(\d{4})$
NC ^(\d{5})$
NE ^(\d{4})$
NF ^(\d{4})$
NG ^(\d{6})$
NI ^(\d{7})$
NL ^(\d{4}[A-Z]{2})$
NO ^(\d{4})$
NP ^(\d{5})$
NZ ^(\d{4})$
OM ^(\d{3})$
PF ^((97|98)7\d{2})$
PG ^(\d{3})$
PH ^(\d{4})$
PK ^(\d{5})$
PL ^(\d{5})$
PM ^(97500)$
PR ^(\d{9})$
PT ^(\d{7})$
PW ^(96940)$
PY ^(\d{4})$
RE ^((97|98)(4|7|8)\d{2})$
RO ^(\d{6})$
RS ^(\d{6})$
RU ^(\d{6})$
SA ^(\d{5})$
SD ^(\d{5})$
SE ^(?:SE)*(\d{5})$
SG ^(\d{6})$
SH ^(STHL1ZZ)$
SI ^(?:SI)*(\d{4})$
SK ^(\d{5})$
SM ^(4789\d)$
SN ^(\d{5})$
SO ^([A-Z]{2}\d{5})$
SV ^(?:CP)*(\d{4})$
SZ ^([A-Z]\d{3})$
TC ^(TKCA 1ZZ)$
TH ^(\d{5})$
TJ ^(\d{6})$
TM ^(\d{6})$
TN ^(\d{4})$
TR ^(\d{5})$
TW ^(\d{5})$
UA ^(\d{5})$
US ^\d{5}(-\d{4})?$
UY ^(\d{5})$
UZ ^(\d{6})$
VA ^(\d{5})$
VE ^(\d{4})$
VI ^\d{5}(-\d{4})?$
VN ^(\d{6})$
WF ^(986\d{2})$
YT ^(\d{5})$
ZA ^(\d{4})$
ZM ^(\d{5})$
CS ^(\d{5})$

Hopefully I didn't make any mistake, my regex-fu is pretty weak.


If someone is still interested in how to validate zip codes I've found a solution:

Using Google Geocoding API we can check validity of ZIP code having both Country code and a ZIP code itself.

For example I live in Ukraine so I can check like this: https://maps.googleapis.com/maps/api/geocode/json?components=postal_code:80380|country:UA

Or using JS API: https://developers.google.com/maps/documentation/javascript/geocoding#ComponentFiltering

Where 80380 is valid ZIP for Ukraine, actually every (#####) is valid.

Google returns ZERO_RESULTS status if nothing found. Or OK and a result if both are correct.

Hope this will be helpful.


.* 

Big Jump forgot about line breaks, blanks and control characters.

International postal codes are a kind of halting problem.


As others have pointed out, one regex to rule them all is unlikely. However, you can craft regular expressions for as many countries as you need using the address formatting info from the Universal Postal Union -- a little-known UN agency.

For example, here are the address formatting rules, including postal code, for a handful of countries (PDF format):

  • Canada
  • Japan
  • Switzerland
  • Russian Federation
  • United States of America


The problem is going to be that you probably have no good means of keeping up with the changing postal code requirements of countries on the other side of the globe and which you share no common languages. Unless you have a large enough budget to track this, you are almost certainly better off giving the responsibility of validating addresses to google or yahoo.

Both companies provide address lookup facuilities through a programmable API.


Given that there are so many edge cases for each country (eg. London addresses may use a slightly different format to the rest of the UK) I don't think that there is an ultimate regex other than maybe:

[0-9a-zA-Z]+

Best of going with a fairly broad pattern (well not quite as broad as the above), or treat each country/region with a specific pattern of its own!

UPDATE: However, it may be possible to dynamically construct a regex based upon lots of smaller, region specific rules - not sure about performance though!

Lots of country specific patterns can be found on the RegExLib site.


Why are you doing this and why do you care? As Tom Ritter pointed out, it doesn't matter whether you even have a ZIP/postal code at all, much less whether it's valid or not, until and unless you are actually going to be sending something to that address. Even if you expect that you will be sending them something someday, that doesn't mean you need a postal code today.


As noted elsewhere the variation around the world is huge. And even if something that matches the pattern does not mean it exists.

Then, of course, there are many places where postcodes are not used (e.g. much or Ireland).


There are reasons beyond shipping for having an accurate postal code. Travel agencies doing tours that cross borders (Eurozone excepted of course) need this information ahead of time to give to the authorities. Often this information is entered by an agent that may or may not be familiar with such things. ANY method that can cut down on mistakes is a Good Idea™

However, writing a regex that would cover all postal codes in the world would be insane.


Somebody was asking about list of formatting mailing addresses, and I think this is what he was looking for...

Frank's Compulsive Guide to Postal Addresses: http://www.columbia.edu/~fdc/postal/ Doesn't help much with street-level issues, however.

My work uses a couple of tools to assist with this: - Lexis-Nexis services, including NCOA lookups (you'll get address standardization for "free") - "Melissa Data" http://www.melissadata.com


This is a very simple RegEx for validating US Zipcode (not ZipCode Plus Four):

(?!([089])\1{4})\d{5}

Seems all five digit numeric are valid zipcodes except 00000, 88888 & 99999.

I have tested this RegEx with http://regexpal.com/

SP


If Zip Code allows characters and digits (alphanumeric), below regex would be used where it matches, 5 or 9 or 10 alphanumeric characters with one hypen (-):

^([0-9A-Za-z]{5}|[0-9A-Za-z]{9}|(([0-9a-zA-Z]{5}-){1}[0-9a-zA-Z]{4}))$


I know this is an old quesiton, but I stumbled across the same problem. I have invoices from over 100 countries and am trying to get the correkt creditor over the zip (if every other check is failing). So what I did is writing a short Python Script, that creates a pattern from a string:

class RegexPatternBuilder:
    """
    Builds a regex pattern out of a given string(i.e. --> HM452 AX2155 : [A-Z]{2}\d{3}\s{1}[A-Z]{2}\d{4})
    """
    __is_alpha_count = 0
    __is_numeric_count = 0
    __is_whitespace_count = 0
    __pattern = ""

    # Count: wich character of the string we're locking at right now
    __count = 0

    # Countrys like  Andora starts theire ZIP with the country abbreviation :AD500
    # So check at first if the ZIP starts with the abbreviation and if so, add it to the pattern and increase the count.
    def __init__(self, zip_string, country):
        self.__zip_string = zip_string
        self.__country = country
        if self.__zip_string.startswith(country):
            self.__pattern = f'({self.__country})'
            self.__count += len(self.__country)

    def build_regex(self):
        # Last step ;
        # Add the current alpha_numeric pattern with count
        if len(self.__zip_string) == self.__count:
            if self.__is_alpha_count:
                self.__pattern += f"[A-Z]{{{self.__is_alpha_count}}}"
            if self.__is_numeric_count:
                self.__pattern += f"\d{{{self.__is_numeric_count}}}"
            return f'{self.__pattern}\\b'

        # Case: Whitespace
        # Check if there is a crossing from numeric / alphanumeric to whitespace,
        # if so --> add the alpha_numeric regex to the whole pattern with the
        # count as the number of viable appeaerances.
        # Since there is max 1 whitespace in a ZIP, add the whitespace regex immediately.
        # Every other case is similar to that.
        if self.__zip_string[self.__count].isspace():
            if self.__is_numeric_count:
                self.__pattern += f"\d{{{self.__is_numeric_count}}}"
            if self.__is_alpha_count:
                self.__pattern += f"[A-Z]{{{self.__is_alpha_count}}}"
            self.__pattern += "\s{1}"
            self.__is_whitespace_count += 1
            self.__is_alpha_count = 0
            self.__is_numeric_count = 0

        # Case: Is Alphanumeric
        if self.__zip_string[self.__count].isalpha():
            if self.__is_numeric_count:
                self.__pattern += f"[0-9]{{{self.__is_numeric_count}}}"
            self.__is_whitespace_count = 0
            self.__is_alpha_count += 1
            self.__is_numeric_count = 0

        # Case: Is Numeric
        if self.__zip_string[self.__count].isnumeric():
            if self.__is_alpha_count:
                self.__pattern += f"[A-Z]{{{self.__is_alpha_count}}}"
            self.__is_whitespace_count = 0
            self.__is_alpha_count = 0
            self.__is_numeric_count += 1

        # Case: Special Character (i.e. - )
        # No escaping or count for this so far, because it shouldn't be needed for our zip purposes
        if not self.__zip_string[self.__count].isalpha() \
                and not self.__zip_string[self.__count].isnumeric() \
                and not self.__zip_string[self.__count].isspace():
            self.__pattern += f'{self.__zip_string[self.__count]}{{1}}'
        self.__count += 1
        return self.build_regex()

With that I created all the different possible regexes for all zips (by country) we have historically and wrote them back into a db table (i.e. something like this in the end: COUNTRY:RE PATTERN:(\d{5})\b [what ever country this might be ;D])

Maybe it helps someone.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜