What is the best way to prevent out of memory (OOM) freezes on Linux?

2022-12-17 18:24 问答作者：

Is there a way to make the OOM killer work and prevent Linux from freezing? I've been running Java and C# applications, where any memory allocated is usually used, and (if I'm understanding them right) overcommits are causing the machine to freeze. Right now, as a temporary solution, I added,

vm.overcommit_memory = 2
vm.overcommit_ratio = 10

to /etc/sysctl.conf.

Kudos to anyone who can explain why the existing OOM killer can't function correctly in a guaranteed manner, killing processes whenever the kernel runs out of "real" memory.

EDIT -- many responses are along the lines of Michael's "if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory". I don't think this is the correct solution. There will always be apps with bugs, and I'd like to adjust the kernel so my e开发者_Python百科ntire system doesn't freeze. Given my current technical understandings, this doesn't seem like it should be impossible.

Below is a really basic perl script I wrote. With a bit of tweaking it could be useful. You just need to change the paths I have to the paths of any processes that use Java or C#. You could change the kill commands I've used to restart commands also. Of course to avoid typing in perl memusage.pl manually, you could put it into your crontab file to run automatically. You could also use perl memusage.pl > log.txt to save its output to a log file. Sorry if it doesn't really help, but I was bored while drinking a cup of coffee. :-D Cheers

#!/usr/bin/perl -w
# Checks available memory usage and calculates size in MB
# If free memory is below your minimum level specified, then
# the script will attempt to close the troublesome processes down
# that you specify. If it can't, it will issue a -9 KILL signal.
#
# Uses external commands (cat and pidof)
#
# Cheers, insertable

our $memmin = 50;
our @procs = qw(/usr/bin/firefox /usr/local/sbin/apache2);

sub killProcs
{
    use vars qw(@procs);
    my @pids = ();
    foreach $proc (@procs)
    {
        my $filename=substr($proc, rindex($proc,"/")+1,length($proc)-rindex($proc,"/")-1);
        my $pid = `pidof $filename`;
        chop($pid);
        my @pid = split(/ /,$pid);
        push @pids, $pid[0];
    }
    foreach $pid (@pids)
    {
        #try to kill process normall first
        system("kill -15 " . $pid); 
        print "Killing " . $pid . "\n";
        sleep 1;
        if (-e "/proc/$pid")
        {
            print $pid . " is still alive! Issuing a -9 KILL...\n";
            system("kill -9 " + $pid);
            print "Done.\n";
        } else {
            print "Looks like " . $pid . " is dead\n";
        }
    }
    print "Successfully finished destroying memory-hogging processes!\n";
    exit(0);
}

sub checkMem
{
    use vars qw($memmin);
    my ($free) = $_[0];
    if ($free > $memmin)
    {
        print "Memory usage is OK\n";
        exit(0);
    } else {
        killProcs();
    }
}

sub main
{
    my $meminfo = `cat /proc/meminfo`;
    chop($meminfo);
    my @meminfo = split(/\n/,$meminfo);
    foreach my $line (@meminfo)
    {
        if ($line =~ /^MemFree:\s+(.+)\skB$/)
        {
            my $free = ($1 / 1024);
            &checkMem($free);
        }
    }
}

main();

If your processes's oom_adj is set to -17 it won't be considered for killing altough I doubt it's the issue here.

cat /proc/<pid>/oom_adj

will tell you the value of your process(es)'s oom_adj.

I put together a simple script that'll set the OOM score on launch. All sub-processes will inherit this score.

#!/usr/bin/env sh

if [ -z "$1" ] || [ -z "$2" ]; then
  echo "Usage: $(basename "$0") oom_score_adj command [args]..."
  echo "  oom_score_adj  A score between -1000 and 1000, bigger gets killed first"
  echo "  command        The command to run"
  echo "  [args]         Optional args for the command to run"
  exit 1
fi

set -eux

echo $1 > /proc/self/oom_score_adj
shift
exec $@

The script sets the score for the local process to the first arg provided. This can be anything between -1000 to 1000, where 1000 is the most likely to get killed first. The rest of the arguments are then executed as a command with args, replacing the current process.

I'd have to say the best way of preventing OOM freezes is to not run out of virtual memory. If you are regularly running out of virtual memory, or getting close, then you have bigger problems.

Most tasks don't handle failed memory allocations very well so tend to crash or lose data. Running out of virtual memory (with or without overcommit) will cause some allocations to fail. This is usually bad.

Moreover, before your OS runs out of virtual memory, it will start doing bad things like discarding pages from commonly used shared libraries, which is likely to make performance suck as they have to be pulled back in often, which is very bad for throughput.

My suggestions:

Get more ram
Run fewer processes
Make the processes you do run use less memory (This may include fixing memory leaks in them)

And possibly also

Set up more swap space

If that is helpful in your use-case.

Most multi-process servers run a configurable (maximum) number of processes, so you can typically tune it downwards. Multithreaded servers typically allow you to configure how much memory to use for their buffers etc internally.

First off, how can you be sure the freezes are OOM killer related? I've got a network of systems in the field and I get not infrequent freezes, which don't seem to be OOM related (our app is pretty stable in memory usage). Could it be something else? Is there any interesting hardware involved? Any unstable drivers? High performance video?

Even if the OOM killer is involved, and worked, you'd still have problems, because stuff you thought was running is now dead, and who knows what sort of mess it's left behind.

Really, if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory.

I've found that fixing stability issues mostly relies on accurately identifying the root cause. Unfortunately, this requires being able to see what's happening when the issue happens, which is a really bad time to be trying to start various monitoring programs.

One thing I sometimes found helpful was to start a little monitoring script at boot time which would log various interesting numbers and snapshot the running processes. Then, in the event of a crash, I could look at the situation just before the crash. I sometimes found that intuition was quite wrong about the root cause. Unfortunately, that script is long out-of-date, or I'd give a link.

继续阅读：linux-kernel memory out-of-memory

What is the best way to prevent out of memory (OOM) freezes on Linux?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？