开发者

How do I find duplicate sequential entries in a Generic List in C#?

I have some tagged files that i want to process.

Each line in file has the following format (formatted for clarity):

Name1     Tag1     Origin1  
Name2     Tag2     Origin2  

I need a C# solution that does the following:

  1. Obtain the tag, origin and line number that the name occurs.
  2. See if two or more sequential names have the same tag. If so, combine them.

In order to do so I tried the following code:

var line_token = new List<object_tag>();
line_token.Add(new object_tag 
{ file_name = filename, 
  line_num = line_number, 
  string_name = name, 
  string_tag = tag, 
  string_origin = origin
});

The List gets its input values from an ArrayList.

Sample Input:

item[0]:  
file_name:"test1.txt"  
line_num:1  
string_name:Asia  
string_tag:NP  
string_origin:<unknown>

Is there a way to search this list based on the string_tag and find if two or more items in a row have the same string_tag, and if so combine them into a new item?


Update: Let me post some of my code to make the problem more clear..

with this i create the list for the files..

 private  static List <object_tag> tagged_line_list()
    {
        string input = "C:Desktop\\_tagged\\";
        string line;
        string[] files;
        
        int j = 0;
        

        if (System.IO.Directory.Exists(input) == false)
        {

            Console.WriteLine("The file doesn't exist");
        }
        //take the folder's files
        files = System.IO.Directory.GetFiles(input);
        //create new list with type object_tag
        var line_token = new List<object_tag>();
        //delete the contents of the list
        line_token.Clear();

        //create an array list 
        ArrayList tokens = new ArrayList();
        tokens.Clear();

        foreach (string file in files)
        {
            string filename = System.IO.Path.GetFileNameWithoutExtension(file);
            int line_number = 1;
            //read the files
            StreamReader sr = new StreamReader(file);

            while ((line = sr.ReadLine()) != null)
            {
                string input_line = line;
                char[] delimiters = { '\t' };
                //split the line in words
                string[] words = input_line.Split(delimiters);
                //add each word to the token array_list
                foreach (string word in words)
                {
                    tokens.Add(word);
                }

                string name = tokens[j+ 0] as string;
                string tag = tokens[j + 1] as string;
                string origin = tokens[j + 2] as string;

               //add to the line-token list instances
                line_token.Add(new object_tag{file_name=filename,line_num=line_number,string_name=name,string_tag=tag,string_origin=origin});
                
                j = j + 3; 
                line_number++;
            }
           
            sr.Close();
        }
        //returns the line_token list
        return line_token;   
    }

next i want to search the list the code for doing that is

private static List<object_tag> search_list()
    {
        //calls the tagged_line_list method for retrieving the line-token list
        var line_token = tagged_line_list();
        object_tag last = null;
        List<object_tag> du_np = new List<object_tag>();
        du_np.Clear();
        List<object_tag> list_np_query = new List<object_tag>();
        list_np_query.Clear();


          var np_query =
            from i in line_token
            where ((i.string_tag == "NP" | i.string_tag == "NPS"))
            select i;
        //create new list which contains instances with string_tag NP or NPS
          list_np_query = np_query.ToList<object_tag>();

          for (int i = 0; i < list_np_query.Count; i++)
          {
              if (last == null)
              {
                  last = list_np_query[i];

              }
              else if (
                  //the objects are in the same file
                  (last.file_name == list_np_query[i].file_name)
                  &
                  //the objects are consecutive
                  (list_np_query[i].line_num - last.line_num == 1)

                  )
              {


                  last.file_name = list_np_query[i - 1].file_name;
                  last.line_num = list_np_query[i - 1].line_num;
                  last.string_name = last.string_name + " " + list_np_query[i].string_name;
                  last.string_tag = list_np_query[i - 1].string_tag;
                  last.string_origin = "<unknown>";

                  du_np.Add(last);

              }
              else
              {
                  last = list_np_query[i];

              }
          }

            return (du_np);
    }

      

 

now I have a list called list_np_query which contains only the ob开发者_如何学Gojects with string_tag NP or NPS. If the object are in consecutive lines and have the same filename i sore them in a new list called du_np. The solution was in front of my but i coudn't see it... Anyway thank all of you for your help and your time!!!!!


Could you use a Dictionary to represent this? A dictionary lets you keep track of information based on a non-numeric value. I'm not sure if this is appropriate to your application though.

var items = new Dictionary<string, object_tag>();

foreach(item in itemArray)
{
    if(items.ContainsKey(item.string_tag))
    {
        //do your combining stuff and store in items[item.string_tag]
    }
    else
    {
        items.add(item.string_tag, new object_tag{/*blablablah*/});
    }
}


You could also write a for loop, with looking ahead, and yield returning when an item satisfies your needs. like:

IEnumerable<object_tag> CombineDuplicates(ArrayList source)
{
  object_tag last = null;
  for (int i=0;i<source.Count;i++)
  {
   if (last == null) 
   {
     last = source[i];
   }
   else if (last.string_tag == source[i].string_tag)
   {
      last.Combine(source[i]);
   }
   else 
   {
      yield return last;
      last = source[i];
   }
  }
  yield return last;
}

Then you can call

foreach (var item in CombineDuplicates(input))
{
   //do whatever you want
}

Not saying it's the only solution but C# has many flavours... :) (You could replace the IEnumerable with a List, create a new List at the beginning of the function and instead of yielding them, you could add them to the list, and return the list at the end. Choose whichever suits your needs better....)


If by "combine" you mean drop the duplicate records, then I have a linq solution for you.

var results =
    (from lt in line_token
    orderby lt.line_num
    group lt by lt.string_tag into glts
    let dups = glts
        .Skip(1)
        .Zip(glts, (lt1, lt0) => new
            {
                lt1,
                delta = lt1.line_num - lt0.line_num
            })
        .Where(x => x.delta == 1)
        .Select(x => x.lt1)
    select glts.Except(dups))
        .SelectMany(x => x)
        .OrderBy(x => x.line_num);

It's not exactly pretty, but it does work.


i would use the list<> and here you can pass many varibles to this <> part. So for example;

 list<string, int> item = new list()<string,int>;

then you can add items using

  item.Add(); 

method. And it will support methods such as

 if(item.Contains())

if this is not really what you are looking for let me know. Sorry but just a note that your code should be better formatted when posting. I had a hard time reading it, had to copy and paste to a notepad and reformat. Just a note for future posting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜