开发者

How should I poll a large number of files for changes?

I'd like to poll the file system for any changed, added or removed files or sub-directories. All changes should be detected quickly but without putting pressure on the machine. The OS is Windows >= Vista, the observed part is a local directory.

Typically, I would resort to a FileSystemWatcher, but this led to problems with other programs that tried to watch the same spot (prominently, Windows Explorer). Also, I heard that FSW is not really reliable even for local folders and with a large buffer.

The main issue I have is that the number of files and directories may be very large (guess 7-digits). Simply running a check for all files every second did noticeably affect my machine.

My next idea was to check different parts of the whole tree per second to reduc开发者_如何转开发e the overall impact, and possibly add a kind of heuristic, like checking files that get changed frequently in quicker succession.

I'm wondering if there are patterns for this kind of problem, or if anyone has experiences with this situation.


We have implemented a similar feature, using C#. The FileSystemWatcher was inefficient with large directory trees.

Our alternative, was using FSNodes, an struct created by us, using the following Windows API calls:

    [StructLayout(LayoutKind.Sequential)]
        private struct FILETIME
    {
        public uint dwLowDateTime;
        public uint dwHighDateTime;
    };

    [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
        private struct WIN32_FIND_DATA
    {
        public FileAttributes dwFileAttributes;
        public FILETIME ftCreationTime;
        public FILETIME ftLastAccessTime;
        public FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public int dwReserved0;
        public int dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst=MAX_PATH)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst=MAX_ALTERNATE)]
        public string cAlternate;
    }

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern bool FindClose(IntPtr hFindFile);

    [DllImport("kernel32", CharSet=CharSet.Unicode)]
    private static extern IntPtr FindFirstFile(
        string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32", CharSet=CharSet.Unicode)]
    private static extern bool FindNextFile(
        IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

What we do is a static processing. We save a metadata tree on disk and compare the stored directory tree vs the loaded one, searching modified (based on its timestamp (faster), or on the file hash). Also, we can manage deleted, added and moved, even moved-modified files (also based on the file hash).

This implementation mixed with a daemon that executed it each POLL_TIME, was valid for us. Hope it helps.


My best guess would be to use USN journal if it is a local machine, you have administrator privileges and partitions are NTFS. USN journal is extremely fast and reliable. It is a long topis and this link explains everything: http://www.microsoft.com/msj/0999/journal/journal.aspx


For *nix environments you can use inotify https://github.com/rvoicilas/inotify-tools/wiki/, which worked great in my limited research on it. There might be a version out there that work with windows which I have less experience with ... quick googling led me to a java clone called jnotify http://jnotify.sourceforge.net/ which is advertised to work on windows so it might be worth trying.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜