top reasons why an app server crashes

2023-02-02 00:22 问答作者：

What are the most likely causes for application server failure?

For example: "out of disk space" is more likely than "2 of the drives in a RAID 4 setup die simultaneously".

My particular environment is Java, so Java-specific answers are welcome, but not required.

EDIT just to clarify, i'm looking for downtime-related crashes (out of memory is a good example) not just one-time issues (lik开发者_运维问答e a temporary network glitch).

If you are trying to keep an application server up, start monitoring it. Nagios, Big Sister, and other Network Monitoring tools can be very useful.

Watch memory availability / usage, disk availability / usage, cpu availability / usage, etc.

The most common reason why a server goes down is rarely the same reason twice. Someone "fixes" the last-most-common-reason, and a new-most-common-reason is born.

Edwin is right - you need monitoring to understand what the problem is. Or better - understand what the problem is AND prevent it from causing downtime.

You should not only track resource consumption but also demand. The difference between the two shows you if you have sized your server correctly.

There are a ton of open source tools like nagios, CollectD, etc. that can give you server specific data - that's only monitoring though, not prevention. Librato Silverline (disclosure: I work there) allows you to monitor individual processes and then throttle the resources they use by placing them in application containers for which you define resource polices. If your server is 8 cores or less you can use it for free.

"Out of Memory" exception due to memory leaks.

All sorts of things can cause a server to crash, ranging from busted hardware (e.g. disk failures) to faulty code (memory leak resulting in an out of memory exception, network failure that got rethrown as a runtime exception and was never caught, in servers that aren't Java servers a SEGFAULT, etc.)

At first, it is usually because of memory leaks, disk space problems, endless loops causing cpu to eat up.

Once you monitor those issues and set up correct logging and warning mechanisms, they turn meta on you... and exploding error handling becomes a possible reason for a full lockup: an error (or more likely: two in an unhappy combination) occurs but when the handler is trying to write to the logfiles or send a warning (by mail or something) it gets another error which it is trying to handle by writing to the logfile or sending a warning or... and this continues until one of the resources gives out: it may lead to skyrocketing server load, memory problems, filling disk space, locking up network traffic which means it won't be accessible for a remote user to correct the problem, etc.

继续阅读：crash

top reasons why an app server crashes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？