What is the typical bottleneck when developing a CRUD web application?
Update: A more precise wording could have been "What is the typical performance bottleneck when developing a CRUD web application?
I'm thinking of web applications like:
- Weblog CMS
- Vocabulary trainer
- URL shortener
- Internet forum
I'm thinking of programming languages like:
- Ruby
- Scala
- Clojure
I'm thinking of datab开发者_运维知识库ase systems like:
- PostgreSQL
- MongoDB
- CouchDB
I'm thinking of operating systems like:
- Mac OS X
- Linux
I'm thinking of typical off-the-shelf hardware configurations:
http://english.keyweb.de/dedicated/index.shtml
Possible software bottlenecks that come to my mind are:
- Programming language
- Database system
- Operating system
Possible hardware bottlenecks that come to my mind are:
- CPU
- RAM
- Hard disk
- Network
If you architect web applications for maximum scalability, then your bottleneck is ultimately going to boil down to management of co-ordinated mutable state (i.e. the parts of your database that require some form of transactional semantics)
Some points to consider:
Static or data not requiring synchronisation/transactions can be replicated cheaply across many small commodity servers. Your NoSQL solutions (CouchDB etc.) should handle this nicely, combined with any of the many great caching solutions for static web data.
Local CPU processing capability (e.g. handling individual web requests) is easy to scale horizontally by adding more web server nodes. CPU speed is unlikely to be your bottleneck anyway given modern processor speeds - most web applications don't really need much CPU power.
Transactional update of data however is a very thorny problem. Read about the Byzantine Generals' Problem if you want to know the theoretical explanation, but basically it's impossible to reliably co-ordinate transactions in a distributed system. You have to make some compromises based on what you value most (data integrity? performance? scalability? fault-tolerance? cost? latency?).
Operating systems etc. don't really make much difference - the overhead is so low and it doesn't really affect scalability concerns. Go with what you have skills in and/or you think you will find most easy to manage. Personally I use Ubuntu on Amazon EC2.
Given the sort of applications you are looking at, I'd probably err towards the NoSQL solutions as it sounds like efficiently handling large volumes is more important than having lots of transactional data. You can always keep a PostgreSQL box for the limited subset of data that requires transactional semantics (user accounts? master reference data? some workflow state?)
The other (more classic) approach would be to get a typical big-iron database (e.g. Oracle, DB2) and buy an expensive cluster of high-end database machines. Then have lots of cheap, replicated web servers doing most of the work and accessing the database cluster as needed when they need to execute transactions. This can work extremely well up to the point where the database cluster starts to get overloaded, at which point it can be an expensive bottleneck to widen..... but arguably if you're getting that much load you can afford to do so. I'd go down this route if you were building e.g. a financial services app.
If you are only doing a prototype or expecting smallish loads to begin with, then you can use a single commodity PostgreSQL machine in place of an expensive database cluster. This is probably the easiest / cheapest option to set up. And if you keep database access to a minimum (lots of caching, careful query design) it can actually take you quite a long way. Just be aware that it will ultimately become your bottleneck if you keep growing.
p.s. you mentioned you are looking at Clojure, if you haven't done so already then I strongly recommend watching this video: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey - a very unique perspective on concurrency which also gives some insight on the problems of managing transactional data in concurrent environments.
It depends (tm). Count yourself lucky if your app has enough traffic to make you care care.
Personally, scope creep. Keep it simple, and ship it solid with fewer features.
None of the 'possible bottlenecks' listed are issues for a simple app. If you develop a complex feature set, you'll probably bump up against processor/memory limitations depending on the server you've chosen. The best way to find out is to build your app and see what happens. If a particular combination of hardware isn't sufficient for your app, be grateful there's a million options to switch to.
For specifics, if you're throwing up a Rails app, start with Heroku and move to an EC2 instance once it's too expensive to maintain. Your will have no server woes, and more time to concentrate on the important things, like making your app.
In most typical cases it is highly unlikely that you will have any bottleneck othre than developer productivity, unless you are doing something completly wrong.
When you have other bottlenecks, you will probably have enough users and funds to eliminate them.
精彩评论