开发者

Directory.GetFiles() but only include files with a specific content (for e.g. "my file contents") using C#

I want to search all XML files which contains a special tag, e.g. 'field' within their contents. How can I achieve this with Directory.GetFiles(...) method in C# ?

string[] filePat开发者_如何学编程hsFields = Directory.GetFiles(@"E:\Code\", "*.xml", SearchOption.AllDirectories);


You can't.

If you want to filter by content, you need to open and read each file to see if it contains your content.


Well, you will have to get all XML files first, and then open them one after another and check the contents for your <tag>. There is no magic shortcut here.

Btw, using Directory.EnumerateFiles() (Fx4) is much more efficient, certainly for a large number of files. It probably won't help to parallelize this.


Let we have

string path= "E:\\Code";

then

IEnumerable<XDocument> q =
    Directory.EnumerateFiles(path, "*.xml") // iterate thru each XML file into the dir
    .Select(x => XDocument.Load(x)) // load each file into memory
    .Where(d => doc.Descendants("field").Count() > 0); // determine tag existence

or

IEnumerable<XDocument> q = from file in Directory.EnumerateFiles(path, "*.xml")
                           let doc = XDocument.Load(file)
                           where doc.Descendants("field").Count() > 0
                           select doc;

Pre .NET 4.0 solution:

Just replace

Directory.EnumerateFiles(dir, "*.xml")

with

Directory.GetFiles(dir, "*.xml", SearchOption.AllDirectories)

Select the resulting node itself:

IEnumerable<IEnumerable<XElement>> qq = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
                                        let doc = XDocument.Load(file)
                                        let field = doc.Descendants("field")
                                        let ignore = doc.Descendants("ignore")
                                        where field.Count() > 0 && ignore.Count() == 0
                                        select field;

returns nested IEnumerable<> because each document can contain several resulting nodes.


To select only file names:

IEnumerable<string> q = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
                        let doc = XDocument.Load(file)
                        where doc.Descendants("field").Count() > 0
                        select file;

or if you don't want to load data to appropriate XDocument object:

IEnumerable<string> q = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
                        let data = File.ReadAllText(file)
                        where data.Contains("field")
                        select file;

but the last will return a lot of additional stuff and junk, I guess.


File and folder exclusion:

var q = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
        where !new[] { "file1", "file2" }.Contains(file)
        let doc = XDocument.Load(file)
        where doc.Descendants("field").Count() > 0
        select file;

var q = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
        where !new[] { "c:\\foo\\bar", "c:\\blah\\blah" }.Contains(Path.GetDirectoryName(file))
        let doc = XDocument.Load(file)
        where doc.Descendants("field").Count() > 0
        select file;

Attributes count condition:

var q = from file in Directory.GetFiles(path, "*.xml", SearchOption.AllDirectories)
        let doc = XDocument.Load(file)
        from f  in doc.Descendants("field")
        where f.Attributes("id").Count() > 0 && f.Attributes("name").Count() > 0
        select file;


What do you think about this Linq statement ?

var files = Directory.GetFiles(@"E:\\Code", "*.xml", SearchOption.AllDirectories)
    .Where(s => s.Contains("<Field "));


Depending on the target OS you can use Windows Search to do the job. One limitation (IMHO) is that it looks like Windows Search only works if the folders you are interested in are indexed.

A sample that I run on my machine:

using System;
using System.Data.OleDb;

namespace TestQindowsSearch
{
    public class Test
    {
        public static void Main()
        {
            var conn = new OleDbConnection("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';");
            conn.Open();
            OleDbCommand cmd = new OleDbCommand("SELECT Top 10 System.ItemUrl FROM SystemIndex WHERE SCOPE='file:h:/projects/db4o/trunk/db4o.net' AND CONTAINS('IActivatable') AND CONTAINS(System.ItemUrl, '.cs')", conn);

            var result = cmd.ExecuteReader();

            while (result.Read())
            {
                Console.WriteLine(result[0]);
            }
        }
    }
}

Produced:

file:H:/Projects/db4o/trunk/db4o.net/Db4objects.Db4o.CS.Optional/bin/Release/Db4objects.Db4o.xml
file:H:/Projects/db4o/trunk/db4o.net/Db4objects.Db4o/Db4objects.Db4o/TA/IActivatable.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4objects.Db4o/Db4objects.Db4o/Internal/Activation/TPUnspecifiedUpdateDepth.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4objects.Db4o/Db4objects.Db4o/Internal/Activation/TPFixedUpdateDepth.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4objects.Db4o/Db4objects.Db4o/Internal/Activation/ActivatableBase.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4oTutorial/Db4odoc.Tutorial.Chapters/F1/Chapter9/Car.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4oTutorial/Db4odoc.Tutorial.Chapters/F1/Chapter9/SensorReadout.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4oTutorial/Db4odoc.Tutorial.Chapters/F1/Chapter8/Car.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4oTutorial/Db4odoc.Tutorial.Chapters/F1/Chapter8/Pilot.cs
file:H:/Projects/db4o/trunk/db4o.net/Db4oTutorial/Db4odoc.Tutorial.Chapters/F1/Chapter8/SensorReadout.cs
Press any key to continue . . .

Hope this helps

Adriano

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜