开发者

Determine if an input is a domain

I would like to have a way to determine if an input is a domain.

Example Inputs:

@sta开发者_运维百科ckexchange.com
@gmail.com
@google

Logic:

1. First determine if the first character is an @
2. If the input ends in a domain ext, .X or .XX or .XXX
3. Then determine if the domain (stackexchange, gmail, google) are not blacklisted. For example I might want to blacklist gmail.

Suggestions on how to go about doing this? Would this live in the controller or the model? Would Regex be the right way to do this or would that be to slow? Thanks

Ideas:

1. Use params[:q][0,1]


Much like email addresses, a domain can appear to be correct, but fail the most basic test of not being a domain you can reach or connect to.

You can check for the @ if you are looking for an email address, but that doesn't tell you if it's a domain. Domains don't have @ signs.

Domains have at least a single ., such as .com. They have a known TLD, aka Top-Level-Domain, which is the .com or .me or .info. The problem with TLDs is that they are being opened up to whatever people want them to be, so, soon, it will be difficult to do a simple lookup.

In my opinion, your best bet is to try to connect to it via a ping, email and http connections. Those are the most likely services to be alive. A secondary choice would be to try to resolve the domain using something like this:

host example.com

which will return:

example.com has address 192.0.32.10
example.com has IPv6 address 2620:0:2d0:200::10

Call it using %x{} or backticks.

It might help to read the "Domain Name Syntax" description on Wikipedia for an overview on what defines a domain name, in particular:

DNS names may technically consist of any character representable in an octet. However, the allowed formulation of domain names in the DNS root zone, and most other sub domains, uses a preferred format and character set. The characters allowed in a label are a subset of the ASCII character set, and includes the characters a through z, A through Z, digits 0 through 9, and the hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner. Labels may not start or end with a hyphen.

RFC 3696 - Application Techniques for Checking and Transformation of Names will give you the full rules.


Regex is what you are looking for. For a domain with an @ in front it would be something like:

possible_domain =~ /\A@([-_a-zA-Z0-9]+\.[a-z]{1,3})\Z/
domain_to_check_against_a_blacklist = $1

What do you want to do with it? Save it, only when it is valid? Then you should take a look at validations and validates_format_of.


I would stick with http://www.regular-expressions.info/email.html (just omit the pre @ stuff)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜