Single Point of Failure on scaling application across AWS
We have a Rails-based application, deployment infrastucture binds to AWS. Current schema included the following layers:
- load balancer (HaProxy)
- Rails-application (EC2) x2
- MySQLd database (EC2 master-slave)
- Redis, DelayedJob background processes
- Wowza media server (EC2)
- S3 assets storage (shared data)
There 开发者_运维问答is 3 SPFs: load balancer, database, media server.
My questions are about redundancy, how can I reduce SPF:
- load balancer. We have a plan to setup secondary load balancer, but domain name still the same. Is DNS A/AAAA roundrobin failover good solution in that case? Is AWS load balancer good to use?
- Is MMM (Multi-Master Replication Manager) reliable? How does it work with Rails (read/write to independent hosts)?
- Wowza media server, is there any well-known HA solutions to work with?
I love these questions as they always seem so simple to answer when in fact they are not.
For starters, your BIGGEST SPF is that everything is on Amazon. I love AWS for many reasons, but in all situations where you need real availability, you're essentially shooting yourself in the foot by relying on them 100%. So your first plan should be to distribute your services to more than 1 provider (cloud, VPS, or dedicated).
I want to aks you a question: if AWS goes down, how long does/can/will it take you to notice and then do something about it, and how quickly do you need your services to be back up and running?
The reason I ask is this: DNS load-balancing of A/AAAA records is a wonderful solution, unfortunately you can't set weights or priorities as you can with SRV/MX records. This means if AWS becomes completely unavailable, you'll have to make a DNS change real quickly to remove the IP. That CAN be automated if your DNS provider has an API which allows that. On the other hand, DNS caching is performed in so many places that it might not be worth making the DNS change, meaning you'll have from 50% to 100% availability if 1 IP is unavailable (assuming you have 2 A records), because some browsers are able to try the 2nd IP if the 1st doesn't work.
In my opinion, considering AWS's excellent uptime, you won't be at fault to assign 2 different IPs (on 2 different providers) to your domain. I think it's better than having 0% availability when 1 IP is down, but there's still no joy in losing 50% of your requests.
You can have 2 load-balancers on each provider, and let them forward requests to the other provider if certain instances/servers are down. In other words, you only need functional load-balancers at BOTH providers, and functional servers/instances at ONE provider. Make sure to select an alternate provider which doesn't have too much latency to AWS ;)
MMM is also a great tool, but it's not related to Rails in any way. Personally I prefer to place a load-balancer in front of all my Database servers and let them handle who gets requests etc. Since data on a database server is so important, it's usually better to have a human look at it and make sure everything is OK when there's a problem, as opposed to letting a tool manage its availability, configuration, etc. MMM works in many situations, perhaps you should try it and see if it answers your needs. I can't say anything bad about it.
I'm not at all familiar with Wowza media server, but a quick search explained a few things. Since Wowza uses RTSP (UDP and TCP), HAProxy is NOT a solution as it only does TCP. Keepalived on the other hand can perform UDP load-balancing (it uses IVPS/LVS). In fact, Keepalived should also be used for your database slave load-balancing if you have long queries.
One final note, there are many ways to "roll your own" AWS-like services such as S3 storage. If you want to avoid having SPFs but still need the same functionality as your AWS services, you should look into running the open source variants, such as Eucalyptus/Cloud.com/Openstack/GlusterFS. There's a lot of work involved in setting up all that stuff, but you'll be happy the day you can say: "so what if X provider is down, Y can take over".
Here are some suggestions:
1) Load Balancer: Create two ha_proxy instances with your application-level load balancing knowledge and the ability to automatically create a new instance on demand. Wire up Amazon Elastic Load Balancing in front of them with health checks to route around a single ha_proxy failure. Dynamically mix in new ha_proxy instances when one fails.
2) Database: I don't think there's a way to handle automatic failover of your Primary in MySQL, but if you introduce a layer to read from replicas and write to the primary you may be able to keep read-only functionality up if a Primary is down.
3) Wowza: You should be able to load-balance multiple Wowza instances behind your ha_proxy layer w/ health checks so a single Wowza failure doesn't disable media streaming
At Scalarium we have a solution which reduces SPF dramatically, you can see a info graphic at Rails in the Cloud on Page 12.
You use the Amazon Elastic Load Balancer to route between your ha_proxy instances. To have even more security you can split your application into multiple availability zones.
MySQL master master replication isn't the easiest thing. You can have a single master instance and have multiple slaves in multiple availability zones. Then you can support read actions even if your master has gone. I think a real master master with failover isn't possible.
ha_proxy should be able to load-balance your Wowza instances.
精彩评论