Convert street address from string to columns - Regex?
I have a list of 350 addresses in a single column excel file that I need to import to a SQL table, breaking the data into columns.
content of the Excel cells is such as this one
Courtesy Motors 2520 Cohasset Rd - Chico, CA 95973-1307 530-893-1300
What strategy should I apply to开发者_运维问答 import this in a clean fashion?
I was thinking
NAME <- anything before the 1st digit
STREET ADDRESS <- from the 1st digit to the '-'
STATE <- Anything from the last ',' to the '-' immediately before (the address field can contain some - )
TELEPHONE <- Last 12 char
ZIP <- 10 first char of the last 22 char
I work in C#
if this matters.
Is RegEx the appropriate approach? I'm not too familiar with them, so I'm not sure. Can somebody suggest a RegEx expression that would do the job (or part of it)?
Thanks!
The following regex should pull out each part in a capture group:
(\D+) ([^-]+) - ([^,]+, \w+) ([\d-]+) ([\d-]+)
Capture groups, in order:
- Name
- Street address
- City, State
- Zip
- Phone
A regular expression is the tool for this job. I am not a C# developer, so I can't give you the exact code. Nonetheless, the following regex should work. Most IDEs have this built in or if you have access to UNIX sed
would work.
([^\d]+)\s(.+?)\s-\s[^,]+,\s([A-Z]{2})\s([^\s]+)\s([^\s]+)
Captures:
- Name
- Address
- State
- ZIP
- Phone
You can use google geocode API. You might have to remove phone number from there, but if someone is looking for address parsing with more functionality than just regex - they can even get lat/long for address.
For your address example
http://maps.googleapis.com/maps/api/geocode/xml?address=2520%20Cohasset%20Rd%20-%20Chico%2C%20CA%2095973-1307%20530-893-1300%20%20&sensor=false
Documentation
https://developers.google.com/maps/documentation/geocoding/
精彩评论