Parsing ZIP (Postal) code from US address with Java
The question is how do you detect 5 digits following each other in string. Ergo finding US postal code.
Side note: I'd like to use the code with GWT so there are limitations on regex and third party libraries. Otherwise I would just use net.开发者_JS百科sourceforge.jgeocoder.
If you're going to use a Regex, this should work for strictly formatted ZIPs: ^\d{5}([-+]?\d{4})?$
- 12345
- 123456789
- 12345-6789
- 12345+6789
- 12345-67ND (yes, you read that right, sometimes the last two can be ND)
But there's still a problem. Some applications try to interpret 5-digit ZIPs as integers--for example Microsoft Excel. This means that sometimes ZIPs which have zeros in front, such as those in New England and Puerto Rico, oftentimes have problems. As such, you may also want to consider looking for 3-digit and 4-digit values.
The "first" ZIP Code in the USA is 00501 and is the IRS. (Perhaps we shouldn't allow that one to verify!) When interpreted as an integer, it's 501. Now we've got a problem.
This is important to know because, unlike credit cards which have a mod 10 checksum, addresses are not self validating. This means that you can't know if an address is formatted and standardized properly without some kind of external authority.
And once you've gone as far as needing to standardize an address via an external authority, you can have the address verified and confirmed as well.
I should mention that I'm the founder of SmartyStreets. We have a web-based address verification service where you can submit your addresses to us in a list of programmatically and we'll clean them up, standardize them, and verify them.
\\d{5}
as a regex I believe will be a starting point
Code:
String[] tokens = string.split("\\d{5}");
// check token length.
Done from my mobile so forgive spelling and syntax
What worked for me is:
(\d{5}(?=\s|$))|(\d{5}-\d{4}(?=\s|$))
It's very simple to express in regular expression: "^\d{5}"
Just have a look on how to implement regular expression mapping in java: http://www.regular-expressions.info/java.html
With a regular expression.
\d{5}
Since a zip should be at the end of an address
\d{5}$
There are two forms of Zip in the U.S.A. A 5 digit number (called zip code) and a 9 digit number (called a zip +4). Here is an algorythm to to parse any valid U.S. zip code: Assumption: The starting point is a String containing a zip code (or zip+4) candidate.
- Iterate through the input string and extract all digits to a second string that I will call the "zipString". Note: zip +4 is often written "12345-1234". This will remove the dash. This may be overly accepting for your purposes because it will also work if the input string is "1a2b3c4d-------5x". This looseness is generally fine for me because it ignores simple and ignorable input errors (like "1 2345" as the zip code).
- If the "zipString" is 5 characters long, that is the zip code.
- If the "zipString" is 9 characters long, the first 5 characters are the zip code and the last 4 characters are the +4 portion of a zip +4.
- If the "zipString" is neither 5 nor 9 characters long, the input is not valid.
Modified for 5 digit only zip:
- Iterate through the input string and extract all digits to a second string that I will call the "zipString". I prefer this to regular expressions because it ignores simple and ignorable input errors (like "1 2345" as the zip code).
- If the "zipString" is 5 characters long, that is the zip code.
- If the "zipString" is not 5 characters long, the input is not valid.
Here's what I did to parse a zipcode from an address string and compare it to an array of zipcodes. The format of the address string is: 1234 Easy St, City, State 55555, USA. It will also handle zips 55555-5555
private static final Pattern pattern = Pattern.compile("\\d{5}(?:[-\\s]\\d{4})?");
private static int []zipcodes = {<your array of zips>};
public static boolean isInServiceArea(String address) {
Matcher matcher = pattern.matcher(address);
int zipcode = 0;
if (matcher.find()) {
zipcode = Integer.parseInt(matcher.group(0));
Log.d(TAG, "zipcode: " + zipcode);
}
for (int code : zipcodes) {
if (code == zipcode) {
return true;
}
}
return false;
}
精彩评论