Technology stack for very frequent gps data collection
I am working on a project that involves gps data collection from many users (say 1000) every second (while they move). I am planning on using a dedicated database instance on EC2 with the mysql persistent block storage and run a ruby on rails application开发者_Go百科 with nginx frontend. I haven't worked on such data collection application before. Am I missing something here?
I will have a another instance which will act as application server and use the data from the same EBS. If anybody has dealt with such a system before, Any advise would be much appreciated?
I would be most worried about MySQL and the disk being your bottleneck. I'm going to assume you're already familiar with the Ruby/Rails trade-off of always needing to throw more hardware at the application layer in return for higher programmer productivity. However, you're going to need to scale MySQL for writes, and that can be a tricky proposition if you're actually talking about more than 1000 QPS (1000 users, writing once a second). I would recommend taking whatever configuration of MySQL you're planning on using and throwing a serious amount of write traffic at it. If it falls over at anything under, say, 3000 QPS (always give yourself breathing room for spikes), you're going to need to either revise your plan (data every second? really?) or write to something like memcache first and use scheduled tasks to write to the database in one go (MySQL 3.22.5 and later supports multiple inserts in a single query, and there's also the LOAD DATA INFILE
method, which can be used in conjunction with /dev/shm
). You can also look into delayed insertion if you're not using InnoDB.
I'm biased of course (I work for Google), but I would be using App Engine for this. We run stuff that gets way more write traffic than this all the time on App Engine and it works great. It scales out of the box, there's no need to start up new images, and you don't have to deal with the issues of scaling SQL-based persistence. Also you get a ton of free quota to work with before billing starts. You can run JRuby if you really want a Ruby environment, or you can opt for Python, which is a bit better supported. Deployment is also much easier for something like this, even if you're using Vlad or Capistrano with EC2.
Edit: Here's a very conservative estimate of your data growth. 16 bytes is just the minimum required to store a lat/lon coordinate pair (two doubles). In the real world you have indexes and other database overhead that will increase this number. Adjust the formula accordingly based on real data to figure out how quickly you'll hit the 150GB limits.
You should use PostgreSQL for this. Postgres has better support for spatial data types (point, line, plane, etc.). Also it has functions for handling and calculations of different spatial data types as well as indexing of such data. You may want to use GeoKit gem for ruby on rails for various operations on ActiveRecord level.
And I agree with webdestroya - every second?
精彩评论