开发者

Way to detect addresses on websites / Which regex?

Who has a good concept to automatically detect adresses on websites with a parser?

I though about something simple like: "contains letters, numbers and has between 3 and 15 words".

Unfortunatel开发者_运维技巧y adresses are different in UK, US, Germany, Spain a.s.o. Who could help me with code snippets, regexps, ideas?

thank you!


I know this is an old question but we may have solved it, at least for US addresses. We wrote an address extractor to do just that. It's not a simple problem and it doesn't work with just REGEX. We are utilizing REGEX to look for particular types of strings but also limiting it as much as possible to get the best candidate strings. Once we pull those out of the input, they are checked against our address validation engine. REGEX+validation gives a very good result. Without the validation, it's just a good guess but you can't know when you are right and when you are wrong.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜