DNS Round-Robin on SSL
We're adding a second web ser开发者_运维知识库ver for redundancy and load sharing purposes. All connections are mandated to be SSL, and adding a dedicated appliance is not possible at this moment.
I'd like to use round robin DNS, where both servers answer to the same domain using different IPs (we have a wildcard SSL certificate, so that's OK). I can get the DNS to return in random/round robin order no problem.
Is this a bad setup when using SSL?
Our user pattern is consistent -- users will consistently be utilizing the web app for 8-10 hours. We want each page view to be as fast as possible, and my concern is users could constantly flip between the servers, potentially negating any SSL handshake caching/keep alive.
Thanks!
Firstly, SSL has the ability to resume an earlier session, so flipping between servers will cost you a few hundred ms per request (longer if several clients are accessing the site simultaneously, since this is CPU time we're talking about).
Whether the clients will actually flip depends, though - DNS "load balancing" is fiddly business:
- if many of your users are using the same recursive nameservers, they'll get the same "first IP" hence no load balancing
- if the DNS record has a high TTL (several hours), caching nameservers will store a particular permutation of IP addresses until they expire (good so long as your users aren't all using the same recursive nameservers)
- if your users have multiple recursive nameservers configured, they may flip if each nameserver has a different "first IP" (bad)
- if you have no mechanism for removing "bad" records, and a low TTL, then if one server goes down 50% of your clients will get the "bad" server and have to wait for a timeout before they can see your site
As you can see there are various tradeoffs depending on whether you're more concerned about redundancy/failover or load balancing; DNS isn't really the best tool here - you really need the servers to share an IP using either a reverse proxy, or something like Heartbeat (assuming you're Linux-based).
An aside: if both servers are answering to the same domain then you don't need a wildcard cert, although CAs often charge more if you intend to use a cert on more than one server.
TLDR: You will be fine. The SSL renegotiations shouldn't happen frequently enough to be noticeable by your end user.
Rant starts here:
Load distribution using DNS is a commonly misunderstood topic that leads into a lot of anecdotal evidence and straw-man arguments. I've been in too many of these meetings.
Here's how I usually settle these arguments:
"Wow yeah that sounds really exoteric [long dramatic pause] but it really can't be that bad since google uses it"
$host encrypted.google.com
encrypted.google.com is an alias for www3.l.google.com.
www3.l.google.com has address 74.125.224.195
www3.l.google.com has address 74.125.224.202
www3.l.google.com has address 74.125.224.193
www3.l.google.com has address 74.125.224.197
www3.l.google.com has address 74.125.224.207
www3.l.google.com has address 74.125.224.206
www3.l.google.com has address 74.125.224.203
www3.l.google.com has address 74.125.224.204
www3.l.google.com has address 74.125.224.196
www3.l.google.com has address 74.125.224.199
www3.l.google.com has address 74.125.224.201
www3.l.google.com has address 74.125.224.194
www3.l.google.com has address 74.125.224.192
www3.l.google.com has address 74.125.224.200
www3.l.google.com has address 74.125.224.205
www3.l.google.com has address 74.125.224.198
Updates:
Is this setup redundant?
It is not inherently redundant in the engineering sense since if one of those ips were to fail it would continue to be served to the customer until a DNS zone change is performed and all downstream caches expire. With that said, most browsers are smart enough to try another ip under these circumstance - reference.
Moreover, a system could easily be devise that instead of requiring a DNS zone change to remove the failed node would, instead, route the ip of the failed instance to a servicing device by simple ip takeover.
Is this setup resilient?
Yes, resilience is achieved by minimizing your failure domain. Back to our example failure of a single ip (and remember these ips may represent load balancers backed by hundred of servers or even an entire data center) the likelihood of a customer hitting that ip is 1/16, or ~6%, (using the google example above). This is inherently more resilient than a system with a single A address, wich would impact 100% of the users, or a system with 2 A records in which the user has an even 50/50 change of hitting a failed resource.
Don't worry about it. There are multiple levels of DNS caches so user is not going to flip between 2 IPs on every request. The IP will stay the same for hours for each client.
We have an opposite problem. When server goes down, the user still has the bad IP. We set the TTL to 1 minute but very few browsers honor it. Due to this issue, VIP is a much better option than DNS for load-balancing on the same network.
DNS round robin does not provide redundancy.
Without substantial additional help it only provides dumb load sharing (nb: not load "balancing", which implies dynamic load distribution based on server load).
Having the same cert on two IPs should be no problem, though.
精彩评论