开发者

Convert street address from string to columns - Regex?

I have a list of 350 addresses in a single column excel file that I need to import to a SQL table, breaking the data into columns.

content of the Excel cells is such as this one

Courtesy Motors 2520 Cohasset Rd - Chico, CA 95973-1307 530-893-1300  

What strategy should I apply to开发者_运维问答 import this in a clean fashion?

I was thinking

NAME <- anything before the 1st digit

STREET ADDRESS <- from the 1st digit to the '-'

STATE <- Anything from the last ',' to the '-' immediately before (the address field can contain some - )

TELEPHONE <- Last 12 char

ZIP <- 10 first char of the last 22 char

I work in C# if this matters.

Is RegEx the appropriate approach? I'm not too familiar with them, so I'm not sure. Can somebody suggest a RegEx expression that would do the job (or part of it)?

Thanks!


The following regex should pull out each part in a capture group:

(\D+) ([^-]+) - ([^,]+, \w+) ([\d-]+) ([\d-]+)

Capture groups, in order:

  1. Name
  2. Street address
  3. City, State
  4. Zip
  5. Phone


A regular expression is the tool for this job. I am not a C# developer, so I can't give you the exact code. Nonetheless, the following regex should work. Most IDEs have this built in or if you have access to UNIX sed would work.

([^\d]+)\s(.+?)\s-\s[^,]+,\s([A-Z]{2})\s([^\s]+)\s([^\s]+)

Captures:

  1. Name
  2. Address
  3. State
  4. ZIP
  5. Phone


You can use google geocode API. You might have to remove phone number from there, but if someone is looking for address parsing with more functionality than just regex - they can even get lat/long for address.

For your address example

http://maps.googleapis.com/maps/api/geocode/xml?address=2520%20Cohasset%20Rd%20-%20Chico%2C%20CA%2095973-1307%20530-893-1300%20%20&sensor=false

Documentation

https://developers.google.com/maps/documentation/geocoding/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜