Python Web Backend
I am an experienced Python developer starting to work on web service backend system. The system feeds data (constantly) from the web to a MySQL database. This data is later displayed by a frontend side (there is no connection between the frontend and the backend). The backend system constantly downloads flight information from the web (some of the data is fetched via APIs, and some by downloading and parsing text / xls files). I already have a script that downloads the data, parses it, and inserts it to the MySQL db - all in a big loop. The frontend side is just a bunch of php pages that properly display the data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable. Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston) 2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries) 3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework) 4) Apache with mod_wsgi to run everything 5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.This all sounds great. I used django before to w开发者_JAVA技巧rite websites (aka request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.
精彩评论