开发者

Extract address/contact details from a text block with name and address?

I have a block of text that includes name, maybe company name, and address, and maybe email address. I want to extract the street address out of that, and preferably name and address.

This data is siphoned from multiple sources, so I have no idea about the actual formatting. It could be something like this

Comp开发者_高级运维any name, owner@domain.com
ATTN John Doe
care of Company Name
123 Street St
New York, NY 12345
US
123-456-7890

But any of those lines could be rearranged or missing (phone number could come first, no ATTN or c/o, etc). Also, this could be from any country.

The goal is to a) plug the address into the Google Maps API, and b) create a contact with as much information as possible.

Here is a random idea I had:

  1. Take any line with an email address (can be found with a regex easily), store the email address and remove the line from further consideration.
  2. Take any line with a phone number (digits only, and [-+()]), store that number, and remove the line from further consideration.
  3. Take the last three lines and consider those the street address - plug them into Google Maps and hope for the best.

Obviously, that's a lot of juju magic. Is there a smarter approach? Are there are any libraries that have good regexes to look for street addresses of different countries?


Depends on your source. If you have control of how it arrives from your source, then you can do some formatting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜