
Linq - using not "IN" operator to select non duplicated lines from a text file

I am using Linq to select and process lines from a text file. My txtfile is two columns delimitted by the pipe character "|". The File contains the following:







You will notice that line 3 and line 6 have a duplicated ID(Column 1). I want to use linq to initially read the posted txt file find the duplicate (and report on it) and then I would like to select from ling 开发者_StackOverflowquery only the lines that are not duplicated. The following is what I have :

 StreamReader srReader = new StreamReader(fUpload.PostedFile.InputStream);

                var query1 =
                       from line in srReader.Lines()
                       let items = line.Split('|')
                       select new UploadVars()
                           ID = items[0],
                           Number = items[1]
                var GroupedQuery = from line in query1
                                   group line by line.ID into grouped
                                   where grouped.Count() > 1
                                   select new {
                                       ID = grouped.Key,
                                       MCount = grouped.Count()

                StringBuilder sb = new StringBuilder();
                foreach (var item in GroupedQuery)

                    sb.AppendFormat("The following external ID's occur more than once and have not been processed:<br> {0}. Duplicated {1} times.", item.ID, item.MCount);

This is all ok and giving me the correct results. I am now looking to select all the lines except the 2 duplicated lines from the text file. I have composed the following linq statement but for some reason I am having no luck:

//lets start at the beginnnig of the the posted filestream 
                fUpload.PostedFile.InputStream.Position = 0;
                srReader = new StreamReader(fUpload.PostedFile.InputStream);
                var query2 = from line in srReader.Lines()
                             let items = line.Split('|')
                             select new UploadVars()
                                 ID = items[0],
                                 Number = items[1]

                var qryNoDupedMems = from Memb in query2
                                      where !(from duped in GroupedQuery
                                              select duped.ID)
                                      select Memb; 

The result of qryNoDupedMems is the complete list from the text file. Could someone explain what I'm doing wrong here... Thanks in Advance

In a group query, the grouped variable is also an IEnumerable containing the tems in the group.

Therefore, you can write the following:

var nonDuplicates = from line in query1
    group line by line.ID into grouped
    where grouped.Count() == 1
    select grouped.First()




验证码 换一张
取 消

