C program stuck on uninterruptible wait while performing disk I/O on Mac OS X Snow Leopard

2022-12-16 03:13 问答作者：

One line of background: I'm the developer of Redis, a NoSQL database. One of the new features I'm implementing is Virtual Memory, because Redis takes all the data in memory. Thanks to VM Redis is able to transfer rarely used objects from memory to disk, there are a number of reasons why this works much better than letting the OS do the work for us swapping (redis objects are built of many small objects allocated in non contiguous places, when serialized to disk by Redis they take 10 times less space compared to the memory pages where they live, and so forth).

Now I've an alpha implementation that's working perfectly on Linux, but not so 开发者_开发技巧well on Mac OS X Snow Leopard. From time to time, while Redis tries to move a page from memory to disk, the redis process enters the uninterruptible wait state for minutes. I was unable to debug this, but this happens either in a call to fseeko() or fwrite(). After minutes the call finally returns and redis continues working without problems at all: no crash.

The amount of data transfered is very small, something like 256 bytes. So it should not be a matter of a very big amount of I/O performed.

But there is an interesting detail about the swap file that's target of the write operation. It's a big file (26 Gigabytes) created opening a file with fopen() and then enlarged using ftruncate(). Finally the file is unlink()ed so that Redis continues to take a reference to it, but we are sure that when the Redis process will exit the OS will really free the swap file.

Ok that's all but I'm here for any further detail. And BTW you can even find the actual code in the Redis git, but it's not trivial to understand in five minutes given that's a fairly complex system.

Thank you very much for any help.

As I understand it, HFS+ has very poor support for sparse files. So it may be that your write is triggering a file expansion that is initializing/materializing a large fraction of the file.

For example, I know mmap'ing a new large empty file and then writing at a few random locations produces a very large file on disk with HFS+. It's quite annoying since mmap and sparse files are an extremely convenient way of working with data, and virtually every other platform/filesystem out there handles this gracefully.

Is the swap file written to linearly? Meaning we either replace an existing block or write a new block at the end and increment a free space pointer? If so, perhaps doing more frequent smaller ftruncate calls to expand the file would result in shorter pauses.

As an aside, I'm curious why redis VM doesn't use mmap and then just move blocks around in an attempt to concentrate hot blocks into hot pages.

antirez, I'm not sure I'll be much help since my Apple experience is limited to the Apple ][, but I'll give it a shot.

First thing is a question. I would have thought that, for virtual memory, speed of operation would be a more important measure than disk space (especially for a NoSQL DB where speed is the whole point, otherwise you'd be using SQL, no?). But, if your swap file is 26G, maybe not :-)

Some things to try (if possible).

Try to actually isolate the problem to the seek or write. I have a hard time believing a seek could take that long since, at worst, it should be a buffer pointer change. Still, I didn't write OSX so I can't be sure.
Try adjusting the size of the swap file to see if that's what is causing the problem.
Do you ever dynamically expand the swap file (as opposed to pre-allocation)? If you do, that may be what is causing the problem.
Do you always write as low in the file as you can? It may be that creating a 26G file may not actually fill it with data but, if you create it then write to the last byte, the OS may have to zero out the bytes before then (deferring the initialization, if any).
What happens if you just pre-allocate the entire file (write to every byte) and not unlink it? In other words, leave the file there between runs of your program (creating it if it doesn't already exist of course). Then in your startup code for Redis, just initialize the file (pointers and such). This may get rid of any problems like those in point 4 above.
Ask on the various BSD sites as well. I'm not sure how much Apple changed under the covers but OSX is just BSD at the lowest level (Pax ducks for cover).
Also consider asking on the Apple sites (if you haven't already done so).

Well, that's my small contribution, hopefully it'll help. Good luck with your project.

Have you turned off file caching for your file? i.e. fcntl(fd, F_GLOBAL_NOCACHE, 1)

Have you tried debugging with DTrace and or Instruments (Apple's experimental dtrace front-end)?

Exploring Leopard with DTrace

Debugging Chrome on OS X

As Linus said once on the Git mailing list:

"I realize that OS X people have a hard time accepting it, but OS X filesystems are generally total and utter crap - even more so than Windows."

继续阅读：c fwrite io macos redis

C program stuck on uninterruptible wait while performing disk I/O on Mac OS X Snow Leopard

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？