Parsing name and address from unstructured text
I am working on an application that requires me to parse unstructured text. I need to parse name, address - area,city,country and zip code from it. The addresses will be Indian.
Sample input: "I am ABC working in XYZ company. I am good at web designing having an experience of 3 years. I live in kothrud,Pune-411038,Maharashtra."
Output: NAME : ABC AREA : KOTHRUD CITY : PUNE STATE : MAHARA开发者_运维技巧SHTRA ZIP CODE : 411038
I am planning to use Apache ConceptMapper for parsing cities and states for which I will have to build a dictionary set myself, but I guess that can be done. For the zip code, I can use regex. I am stuck at how to parse a name and area. Regex can be used to get name and area with little hacking and lots of patterns but I am wondering if there is any better solution available.
Is there any database I can query to, that would return addresses? I haven't looked into Google maps/places but can you achieve address parsing with them easily?
Any inputs would be highly appreciated.
Thanks.
The Google Geocoding API can help with this. It will return the map coordinates for a given address or an appropriate status code if no match is found.
精彩评论