开发者

File handle leaking (maybe) in a C library makes trouble with NFS (+python, but that's incidental)

here is a quite cool problem.

I have a python script (main) that calls a python module (foo.py) which in turns calls another python module (barwrapper.py) uses LoadLibrary to dynamically open and access a libbar.so library.

libbar and the whole rest of the chain open and create files to perform their task. The problem arises when we issue a rmtree in the main python script to get rid of the temporary directory created by the imported modules. rmtree is invoked at the end of the script, just before exiting. The call fails because the directory contains .nfs-whatever hidden files, which I guess are the removed files. These files apparently are kept open in the code, forcing nfs to move them to these .nfs-whatever files until the file descriptor is released. This situation does not arise in other filesystems, because files associated to held descriptors are effectively removed but kept accessible by the kernel until the descriptor is closed.

We strongly suspect that the .so library is leaking file descriptors, and these non-closed files ruin the rmtree party at cleanup time. I thought about unloading the .so file in barwrapper, but apparently there's no way to do that, and I am not sure if the dynloader will actually remove the lib from the process space and close the descriptors, or if it will just mark it unloaded and that's it, waiting to be replaced by other stuff, but with the descriptors leaked.

I can't really think of other workarounds to the problem (apart from fixing the leaks, someth开发者_如何学编程ing we would not like to do, as it's a 3rd party library). Clearly, it happens only on nfs. Do you have any idea we can try out to fix it ?


The kernel keeps track of file descriptors, so even if you got python to unload the .so and release the memory, it would not know to close the leaked file descriptors. The only thing that comes to mind is importing the .so after forking, and only cleaning up after the forked child process has exited (and the file handles implicitly closed on exit by the kernel).


The good solution is to fix the handles leak, but if you're not sure of who is leaking, maybe a strace call would help you to localize the leak and submit the bug to the maintainers of the 3rd party library (or better if it is an open source library, try to submit a patch ;) ).

On the other hand, maybe a umount/mount on the nfs partition could help to force to close the handles.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜