开发者

c# EnumerateFiles wildcard returning non matches?

As a simplified example I am executing the following

IEnumerable<string> files = Directory.EnumerateFiles(path, @"2010*.xml", 
    SearchOption.TopDirectoryOnly).ToList();

In my results set I am getting a few files which do no match the file pattern. According to msdn searchPattern wildcard is "Zero or more character开发者_如何学Cs" and not a reg ex. An example is that I am getting a file name back as "2004_someothername.xml".

For information there are in excess of 25,000 files in the folder.

Does anyone have any idea what is going on?


This is due to how Windows does wildcard matching - it includes the encoded 8.3 filenames in its wildcard search, resulting in some surprising matches!

A way to get around this bug is to retest all file results that come back through the OS wildcard match and test with a manual comparison of the wildcard to each (long) file name. Another way is to turn off 8.3 filenames altogether via the registry. I have been burned by this on numerous occasions, including having important (non-matching) files get deleted via a wildcard based del command from the command prompt.

To summarize, be very careful, especially if you have many files in a directory on making any critical production decisions or taking any actions based on an OS file/wildcard match, without a secondary verification of results.

Here is an explanation of this bizarre behavior.

Another explanation from O'Reilly's site.


I can reproduce your problem with the follow code (Sorry, VB). It creates 55,000 zero-byte files named 2000_0001.xml through 2010_5000.xml. Then it looks for all of the files that start with 2010. On my machine (Windows 7 SP1 32-bit) it returns 5,174 files instead of just 5,000.

Option Explicit On
Option Strict On

Imports System.IO

Public Class Form1

    Private TempFolder As String = Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "Temp")

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        CreateFiles()

        Dim Files = Directory.EnumerateFiles(TempFolder, "2010*.xml", SearchOption.TopDirectoryOnly).ToList()
        Using FS As New FileStream(Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "Report.txt"), FileMode.Create, FileAccess.Write, FileShare.Read)
            Using SW As New StreamWriter(FS, System.Text.Encoding.ASCII)
                For Each F In Files
                    SW.WriteLine(F)
                Next
            End Using
        End Using


        DeleteFiles()
    End Sub

    Private Sub CreateFiles()
        If Not Directory.Exists(TempFolder) Then Directory.CreateDirectory(TempFolder)
        Dim Bytes() As Byte = {}
        Dim Name As String
        For Y = 2000 To 2010
            Trace.WriteLine(Y)
            For I = 1 To 5000
                Name = String.Format("{0}_{1}.xml", Y, I.ToString.PadLeft(4, "0"c))
                File.WriteAllBytes(Path.Combine(TempFolder, Name), Bytes)
            Next
        Next
    End Sub
    Private Sub DeleteFiles()
        Directory.Delete(TempFolder, True)
    End Sub
End Class


Not a solution for the MS bug (which possibly uses Windows File Search underneath, which would be terrible for your results...), but a solution as a workaround, which gives you some extra leverage and control over the results:

var files = from file in 
      Directory.EnumerateFiles(path, "*",
      SearchOption.TopDirectoryOnly)
      where (new FileInfo(file)).Name.StartsWith("2010") &&
          (new FileInfo(file)).Extension == "xml"
      select dir;


I just tried your example and I can't see it doing anything wrong, so I guess there's more to your environment and/or the "non simplified" code that isn't covered here.

I've used this code:

Console.WriteLine("Starting...");
IEnumerable<string> files = Directory.EnumerateFiles("C:\\temp\\test\\2010", @"2010*.xml", SearchOption.TopDirectoryOnly).ToList();

foreach (string file in files)
{
    Console.WriteLine("Found[{0}]", file);
}

Console.ReadLine();

In my folder structure i've created the following:

c:\temp\test\2010\2004_something.xml c:\temp\test\2010\2010_abc.xml c:\temp\test\2010\2010_def.xml

The output of the application is simply:

Starting...
Found[C:\temp\test\2010\2010_abc.xml]
Found[C:\temp\test\2010\2010_def.xml]

Can you provide some more feedback as to what is happening in your scenario, in the real app? or can you reproduce the problem in a smaller app?


Having suffered the same problem, and finding this post I thought I would post my solution:

IEnumerable<string> Files = Directory.EnumerateFiles(e.FileName, "*.xml").Where(File => File.EndsWith(".xml", StringComparison.InvariantCultureIgnoreCase));

This only tests the suffix but eliminates matches to my backup files that end .xml~.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜