开发者

What is a good way to test the use of msync on recent Linux kernels?

I am using msync in my application on Linux 2.6 to ensure consistency in the event of a crash. I need to thoroughly test my usage of msync but the implementation seems to be flushing all the relevant pages for me. Is there a way to prevent automatic flushing of 开发者_运维问答mmap'd pages onto the disk to expose erroneous usage of msync on my part?


With apologies to @samold, "swappiness" has nothing to do with this. Swappiness just affects how the kernel trades off swapping dirty anonymous pages versus evicting page cache pages when memory is low.

You need to play with the Linux VM tunables controlling the pdflush task. For starters, I would suggest:

sysctl -w vm.dirty_writeback_centisecs=360000

By default, vm.dirty_writeback_centisecs is 3000, which means the kernel will consider any dirty page older than 30 seconds to be "too old" and try to flush it to disk. By cranking it up to 1 hour, you should be able to avoid flushing dirty pages to disk at all, at least during a short test. Except...

sysctl -w vm.dirty_background_ratio=80

By default, vm.dirty_background_ratio is 10, as in 10 percent. That means when more than 10 percent of physical memory is occupied by dirty pages, the kernel will think it needs to get busy flushing something to disk, even if it is younger than dirty_writeback_centisecs. Crank this one up to 80 or 90 and the kernel should be willing to tolerate most of RAM being occupied by dirty pages. (I would not set this too high, though, since I bet nobody ever does that and it might trigger strange behavior.) Except...

sysctl -w vm.dirty_ratio=90

By default, vm.dirty_ratio is 40, which means once 40% of RAM is dirty pages, processes attempting to create more dirty pages will block until something gets evicted. Always make this one bigger than dirty_background_ratio. Hm, come to think of it, set this one before that one, just to make sure this one is always larger.

That's it for my initial suggestions. It is possible that your kernel will start evicting pages anyway; the Linux VM is a mysterious beast and seems to get tweaked on every release. Hopefully this provides a starting point.

See Documentation/sysctl/vm.txt in the kernel sources for a complete list of VM tunables. (Preferably refer to the documentation for the kernel version you are actually using.)

Finally, use the /proc/PID/pagemap interface to see which pages are actually dirty at any time.


A few guesses:

You can fiddle with the swappiness of the system via the /proc/sys/vm/swappiness tunable:

   /proc/sys/vm/swappiness
          The value in this file controls how aggressively the
          kernel will swap memory pages.  Higher values increase
          agressiveness, lower values descrease aggressiveness.
          The default value is 60.

(Wow. proc(5) needs to be run through a spell-checker.)

If setting the swappiness to 0 doesn't do the trick, there are more tunable knobs; the Documentation/laptops/laptop-mode.txt file contains a good description of the laptop_mode script's behaviors:

To increase the effectiveness of the laptop_mode strategy, the laptop_mode
control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
/proc/sys/vm to about 10 minutes (by default), which means that pages that are
dirtied are not forced to be written to disk as often. The control script also
changes the dirty background ratio, so that background writeback of dirty pages
is not done anymore. Combined with a higher commit value (also 10 minutes) for
ext3 or ReiserFS filesystems (also done automatically by the control script),
this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.

You might wish to take these numbers to their extremes; if you're really curious about your application's behavior, it sounds reasonable to set these values quite high and see how long a sync(1) command takes when it's all done. But these are system-wide tunables -- other applications may not be so happy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜