Match each address from the address number to the 'street type'
I have a paragraph of text that contains the following addresses:
- at 900 Greenwood St.
- in 500 block of Main Street
- at 670 W. Townlin开发者_如何学运维e Ave.
- before 1234 River Avenue
- of 1125 Main Ave.
I want to match each address from the address number to the 'street type' (ave., street, lane, road, rd., etc.) except for addresses that begin with the word of.
So of the addresses above, the statement would match:
900 Greenwood St. 500 block of Main Street 670 W. Townline Ave. 1234 River Avenue
and would not match:
1125 Main Ave.
As far as I know, there isn't a one simple regex pattern for this kind of complicated task. There are too many variables to cover for one pattern to work reliably. My first guess would be to look for "street", "ave", etc., but what if the street name doesn't have a suffix (i.e. 999 La Canada)? You could look for any phrase between "at", "in" or "before", but what if one of those phrases isn't an address? See what I mean?
My suggestion would be to take a look at Lingua::EN::AddressParse for Perl.
This is fulfilling your request:
(?!^of\b)^.*?(\d+.*?(?:St\.|Street|Ave\.|Avenue))$
See it here on Regexr
(?!^of\b)
Negative look ahead, row does not start with the word "of"
^
Matches the start of a row, use the m
modifier!
.*?
matches everything non greedy
(\d+.*?
when the first numbers are found start the first capturing group with the (
(?:St\.|Street|Ave\.|Avenue))
Non capturing group because of the ?:
matches the alternations between the |
. The last )
closes the capturing group with the result.
$
Matches the end of the row, use the m
modifier!
Your result is in the first capturing group.
Important this is working with your given examples, addresses can be that different, it will not work on all kind of existing addresses.
When
s = "at 900 Greenwood St.\n\
in 500 block of Main Street\n\
at 670 W. Townline Ave.\n\
before 1234 River Avenue\n\
of 1125 Main Ave."
the regex
/(?:^|\s)(?:(?!of\b)[a-z]+)\s*(\d[\s\S]*?\b(?:ave\.|avenue|st\.|street|lane|road|rd\.))/gi
used thus
var addresses = [];
for (var match = [], re = /(?:^|\s)(?:(?!of\b)[a-z]+)\s*(\d[\s\S]*?\b(?:ave\.|avenue|st\.|street|lane|road|rd\.))/gi;
match = re.exec(s);) {
addresses.push(match[1]);
}
produces
["900 Greenwood St.","500 block of Main Street","670 W. Townline Ave.","1234 River Avenue"]
var addrs = create_array_of_possible_addresses();
var matching_addrs = [];
for (var i=0; i < addrs.length; i++) {
if ( addrs[i].match(/^of/) continue;
if ( addrs[i].match((/\d.*(?:St\.?|Street|Ave\.?|Avenue|Ln\.?|Rd\.?|Road))/ )
matching_addrs.push( RegExp.$1 );
}
Untested.
精彩评论