Reasons for NOT scaling-up vs. -out?

2022-12-10 21:29 问答作者：

As a programmer I make revolutionary findings every few years. I'm either ahead of the curve, or behind it by about π in the phase. One hard lesson I learned was that scaling OUT is not always better, quite often the biggest performance gains are when we regrouped and scaled up.

What reasons to you have for scaling out vs. up? Price, performance, vision, projected usage? If so, how did this work for you?

We once scaled out to several hundred nodes that would serialize and cache n开发者_JAVA百科ecessary data out to each node and run maths processes on the records. Many, many billions of records needed to be (cross-)analyzed. It was the perfect business and technical case to employ scale-out. We kept optimizing until we processed about 24 hours of data in 26 hours wallclock. Really long story short, we leased a gigantic (for the time) IBM pSeries, put Oracle Enterprise on it, indexed our data and ended up processing the same 24 hours of data in about 6 hours. Revolution for me.

So many enterprise systems are OLTP and the data are not shard'd, but the desire by many is to cluster or scale-out. Is this a reaction to new techniques or perceived performance?

Do applications in general today or our programming matras lend themselves better for scale-out? Do we/should we take this trend always into account in the future?

Because scaling up

Is limited ultimately by the size of box you can actually buy
Can become extremely cost-ineffective, e.g. a machine with 128 cores and 128G ram is vastly more expensive than 16 with 8 cores and 8G ram each.
Some things don't scale up well - such as IO read operations.
By scaling out, if your architecture is right, you can also achieve high availability. A 128-core, 128G ram machine is very expensive, but to have a 2nd redundant one is extortionate.

And also to some extent, because that's what Google do.

Scaling out is best for embarrassingly parallel problems. It takes some work, but a number of web services fit that category (thus the current popularity). Otherwise you run into Amdahl's law, which then means to gain speed you have to scale up not out. I suspect you ran into that problem. Also IO bound operations also tend to do well with scaling out largely because waiting for IO increases the % that is parallelizable.

The blog post Scaling Up vs. Scaling Out: Hidden Costs by Jeff Atwood has some interesting points to consider, such as software licensing and power costs.

Not surprisingly, it all depends on your problem. If you can easily partition it with into subproblems that don't communicate much, scaling out gives trivial speedups. For instance, searching for a word in 1B web pages can be done by one machine searching 1B pages, or by 1M machines doing 1000 pages each without a significant loss in efficiency (so with a 1,000,000x speedup). This is called "embarrassingly parallel".

Other algorithms, however, do require much more intensive communication between the subparts. Your example requiring cross-analysis is the perfect example of where communication can often drown out the performance gains of adding more boxes. In these cases, you'll want to keep communication inside a (bigger) box, going over high-speed interconnects, rather than something as 'common' as (10-)Gig-E.

Of course, this is a fairly theoretical point of view. Other factors, such as I/O, reliability, easy of programming (one big shared-memory machine usually gives a lot less headaches than a cluster) can also have a big influence.

Finally, due to the (often extreme) cost benefits of scaling out using cheap commodity hardware, the cluster/grid approach has recently attracted much more (algorithmic) research. This makes that new ways of parallelization have been developed that minimize communication, and thus do much better on a cluster -- whereas common knowledge used to dictate that these types of algorithms could only run effectively on big iron machines...

继续阅读：scalability

Reasons for NOT scaling-up vs. -out?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？