Does using stateful web servers make sense?

2023-02-02 00:57 问答作者：

I am working on a web application, which historically was built on a PHP/MySQL stack.

One of they key operations of the application had to do some heavy calculations which required iterating over every row of an entire DB table. Needless to say this was a serious bottleneck. So a decision was made to r开发者_开发百科ewrite the whole process in Java.

This gave us two benefits. One was that Java, as a language, was much faster than a PHP process. The second one was that we could maintain the entire data set in the Java application server memory. So now we can do the calculation-heavy operations in memory, and everything happens much faster.

This worked for a while, until we realized we need to scale, so we now need more web servers.

Problem is - by current design, they all must maintain the exact same state. They all query the DB, process the data, and maintain it in memory. But what happens when you need to change this data? How do all the servers maintain consistency?

This architecture seems flawed to me. The performance benefit from holding all the data in memory is obvious, but this seriously hampers scalability.

What are the options from here? Switch to a in-memory, key-value, data store? Should we give up holding state inside the web servers entirely?

now switch to Erlang :-)

yeah, that's a joke; but there's a grain of truth. the issue is: you originally had your state in an external, shared repository: the DB. now you have it (partially) precalculated in an internal non-shared repository: Java RAM objects. The obvious way is to have it still precalculated but in an external shared repository, the faster the better.

One easy answer is memcached.

Another is to build your own 'calc server', which centralizes both the calculation task and the (partial) results. The web frontend processes just access this server. In Erlang it would be the natural way to do it. In other languages, you sill can do it, just more work. Check ZeroMQ for inspiration, even if you don't use it in the end (but it's a damn good implementation).

This may be cliche, but data always expands to fill the space you put it in. Your data might all fit in memory today but I guarantee you it won't at some time in the future. How far away that is is the time-frame you have to figure out a better architecture. The statefulness of your application is just a symptom of this bigger problem.

Does everyone do different calculations on the entire dataset? Is this something you can do in a batch overnight and have folks access during the day? How time-sensitive is it?

I think these are the questions you need to answer becuase at some point you won't be able to buy enough memeory to store the data you need. That might sound silly given where you are now, but you should plan on that being true. Many developers I've talked to don't think about what success looks like and what impact it has on their designs.

I agree with you - this sounds flawed, but I'd need more detail to know for sure.

You mention a large data set and heavy calculations, but you don't talk about how the data is updated, when the calculations are done, whether it's a day's worth of data or the entire data set, etc. It sounds a lot like a batch job that could be done daily off-line.

If that's the case, I'm not sure where the web ties into it. Are your web users just doing custom queries after the crunching is done? Is the data read-only or read-mostly for users? Or are they changing the data continuously on the fly?

I wonder if the persistence technology you've chosen affects things? Perhaps a NoSQL alternative could be better for your problem - like a distributed MongoDB cluster.

This is a data-engine question, I believe, as much as it is a web-server-distribution question. Why can't your (central) database engine do the calculation (quickly enough)?

You could store precalculated values which are flagged as stale when the underlying data are changed, requiring a recalc. There's no getting around the need to recalc when data change. You just need to manage when and how the change occurs as it will affect consumers of the data.

继续阅读：architecture scalability web-applications

Does using stateful web servers make sense?

更多精彩内容

精彩评论

最新问答

大家觉得三星电视怎么样?？

电动幕布挂不平会不会有皱纹？

海信激光电视视距是多少,客厅大小怎么匹配?？

如何打开屏幕镜像？

检查输卵管堵了哪家医院好？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？