开发者

How to correct email address domains which are misspelled?

Sometimes users misspelled their e开发者_开发知识库mail domain and hence they enter wrong email address. Eg. abc@gmial.com rather than abc@gmail.com

Has anybody thought about this before? Can anybody suggest how to handle this type of mistakes?


It didn't exist when this question was asked, but I recommend MailCheck which auto-suggests corrections to entered emails. It's used successfully by large companies.


Can anybody suggest how to handle this type of mistakes?

You would usually send a confirmation E-Mail to the address given, and proceed only if a link in that E-Mail has been clicked.

There is no other good way to deal with this - it's impossible to tell for sure whether gmial.com is a typo or not, seeing as it's a valid domain.


Create a list of common email domain names:

hotmail.com
gmail.com
googlemail.com
... etc

When a user enters an email address, take the domain name of the entered address and take the Levenstein distance between your list. If the distance is 1 (or maybe up to 2) then ask the user to confirm that's the email address they meant.


In my opinion it is bordering on impossible to come up with a generic solution for the generic case.

That being said, the most common typo is to interchange two adajcent letters. So you might want to check for character content for the largest sites gmail, yahoo and what have you; Based on that suggest an alternative spelling if the original does not match gmail etc.

  • Do not assume the user is at fault, suggest alternatives if it looks suspicious compared to common names. A white-list was mentioned in another reply.
  • Use confirmation mails if you need to know you can get a reply from this address. You cannot assume the spelling you find is in error, that is what confirmation mails are for.
  • Make it very non-obtrusive (ajax springs to mind).


In our forms we're using a combination of techniques. While bad data can still slip through, the chances are vastly reduced.

First is to do a simple formatting regex that is commonly available - just be sure it's RFC-compliant. If this fails, it's good to offer the user a confirmation form at this point, because they may catch other errors for you while fixing this problem.

The next part is to check the TLD part of the domain. Since all TLDs can be known, these are relatively easy to scan for misspellings using some regex tests. Just keep a list of all current TLDs in a table somewhere and update it form time to time as needed (mind you, this list can get complex when dealing with international TLDs. If you're only dealing with US traffic, the rules are much easier, and that's something else you can filter out. For example, if you're selling a service only available in the US, it would make sense to filter out international emails at form submission time. We are, so this works for us).

Third is to do something like what @npclaudiu suggested - scan for common misspellings of big-name mail hosts (gmail, hotmail, yahoo, etc) in the domain part and if a possible hit is detected, offer a confirmation form to the user. (You entered someone@hptmail.com, did you mean hotmail.com?)

If you get through those steps, then you can do the MX lookup suggested by @symcbean.

Finally, if all of that succeeds, there is a method (but I've not yet tested it) for communicating with the remote SMTP host to see if the mailbox exists. We're about to begin testing this ourselves. I found the how-to for such here: http://www.webdigi.co.uk/blog/2009/how-to-check-if-an-email-address-exists-without-sending-an-email/


The funny thing is that the url does exist http://www.gmial.com In fact it would be very difficult for you to know if it's a mistake or just a "strange" domain. Look at the Google API's because when you type something wrong in Google they propose you "did you mean...."

good luck

Arnaud


You can not provide this functionality in a way that you auto correct the misspelled email domain names, because the name which you are assuming to be invalid, would be valid. you should expect anything to be entered as a email address domain name.

I would suggest, if you are creating a signup form, you provide user with a dropdown having all possible domain names which you are aware of so that he can make a selection from that.

Hope this helps.


You could create a list of popular e-mail domains (gmail.com, yahoo.com, ymail.com, etc) in your db and validate the e-mail address that the user inputs against this list, and if the domain resembles with one of these domains, you should show a warning and allow the user to correct it if necessary, not auto correct it. And to compare the domain entered with the domains in your list, you might use an algorithm like the the one used in the soundex function in SQL Server, that matches words based on if one word sounds like the second.

Edit: you can find more details the SOUNDEX function here.


As mentioned before, it is not a good idea to automatically assume that someone has mistyped an email. A better approach would be to implement a little javascript function that checks if the domain of the email was possibly mistyped and alert the user instead of assuming they were wrong from the start.

Give me a minute to create a little mockup.

EDIT: OK, so maybe it was more than a minute. Take a look at http://jsbin.com/iyaxuq/8/edit and see for yourself how javascript can help prevent common typing errors. Try emails like: test@gmail.cmo, another@yhaoo.com, loser@htomali.ocm (typo of hotmail), and me@aol.com.

Note: I used a lazy regex to validate the email. Don't rely on it (or for that matter, most regexes) for a real app.


Trying to automate correction of bad data is a very dangerous practice. Ultimately, only the user can provide the correct data. However there are strict rules about formatting an email address - a regex check can be run in javascript (or using the preg functions with the same regex syntax) - but note that there are a lot of bad examples on the internet of regexes claiming to solve the problem.

This should be a fairly complete implementation of an RFC2822 ADDR_SPEC validator:

/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/gi

However in practice I find this to be adequate:

/^[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+@([a-z0-9\-]+\.){1,}([a-z]{2,22})$/gi

Then, serverside, you can do an MX lookup to verify that the domain provided not only meets the formatting requirements but exists as an email receiving site.

This does not prove that the named mailbox exists at that site, nor that it is accepting emails - ultimately you'd need to send an email to that address including a click back link / password to establish whether the email address is valid.

Update

While, as the top voted answer here says, the best way to validate an ADDR_SPEC is to send a token to the address to be submitted back via the web, this is not of much help if the data is not coming from the person whom controls the mailbox, and the action is dissociated from the primary interaction even when they do. A further consideration is that an email address which is valid today might not be tomorrow.

Using a regex (and an MX lookup) is still a good idea to provide immediate feedback to the user, but for a complete solution you also need to monitor the bounces.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜