What needs to be done to develop highly scalable web application in Java? [closed]
I would like to know from highly experienced Java professionals who have worked with large scale production systems, what needs to be done to build web application to scale that can handle 10 million plus requests per day?
For example if some sort of caching needs to be done then what production quality library is used for caching?
This is a huge topic and can't be easily answered - very large scale applications in general need to be carefully designed for the specific kind of load they are expected to handle.
For example: your architecture will be very different if it is handling mostly read-only page views (easy to scale by replicating lots of cheap application servers) vs. if it is handling complex financial transactions (where you need a way of co-ordinating large numbers of simultaneous transactions).
Some general hints:
- Prefer horizontal scaling - as far as possible, you want to be able to achieve your scalability by adding more cheap boxes. The more that you can design your application to fit this model, the better.
- Co-ordinated changes to mutable state will ultimately be your bottleneck to scalability, as it's the one thing that cannot be scaled as far you as you like horizontally with cheap boxes. Work out what these changes will be, and design accordingly. If you're lucky, a single database instance will be sufficient. If not, you're into very expensive database cluster / layering transactional semantics over NoSQL / highly-custom-data-store territory.
- Use proven libraries / components that can scale. e.g. Netty for high throughput communications.
- Don't try this without expertise on your team - scaling applications into the "big league" is hard and requires specialist skills. If you do it wrong, you can get stuck with bottlenecks that require expensive rewrites. Hire someone who's done it before.
BTW - 10 million requests per day isn't actually that big. That's only 115 requests per second. With reasonably tight coding, one modern server can handle that......
Most important is that your application should scale with some predictability. As for the "how", that is impossible to say without a more in-depth analysis of the requirements and architecture. Caching is usually a key-component in some form or other. Depending on several factors, such as the volatility of the data and the rate of change, different approaches can be taken. Simplest is to have only local caches, bearing in mind that changes made to cached data will not be immediately reflected on all nodes unless some cache synchronization is added. On the other end, you have a fully distributed caches, like Terracotta BigMemory or other distributed/clustered caching solutions.
I advice you to establish performance testing baselines as early as possible. That will allow you to test the scalability of the system you are developing. Run the benchmark against one, two, three etc load-balanced nodes and measure the throughput. Also identify any resources or data that must be shared between all the nodes and how to properly synchronize these for optimal scalability.
It's very difficult to condense what is usually gained by writing and maintaining large scale applications into an answer that comes in the form of a forum post. Usually, people pay alot of money to other people with this kind of expertise.
You need to get a feel for the idea of the applications. Some traps become appearant during the analysis stage, especially regarding the infrastructure (what is served where over what?), others by the data handling (how will synchronization work?).
Others will appear later, like "What will we do when X crashes" (insert any part of the infrastructure for X). You check and recheck your recovery times against these scenarios.
Then you write up the parts of the whole idea and test fail scenarios and use cases against it.
At the end, if you think everything has been thought of, you give it to someone just as experienced as yourself, maybe even more, then write down everything they see as a problem, test their complaints and alter the structure of the application and/or infrastructure to accomodate all use cases.
精彩评论