开发者

Remove duplicates from array of struct

I can't figured out in remove duplicates entries from an Array of struct

I have this struct:

public struct stAppInfo
{
    public string sTitle;
    public string sRelativePath;
    public string sCmdLine;
    public bool bFindInstalled;
    public string sFindTitle;
    public string sFindVersion;
    public bool bChecked;
}

I have changed the stAppInfo struct to class here thanks to Jon Skeet

The code is like开发者_Go百科 this: (short version)

stAppInfo[] appInfo = new stAppInfo[listView1.Items.Count];

int i = 0;
foreach (ListViewItem item in listView1.Items)
{
    appInfo[i].sTitle = item.Text;
    appInfo[i].sRelativePath = item.SubItems[1].Text;
    appInfo[i].sCmdLine = item.SubItems[2].Text;
    appInfo[i].bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false;
    appInfo[i].sFindTitle = item.SubItems[4].Text;
    appInfo[i].sFindVersion = item.SubItems[5].Text;
    appInfo[i].bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false;
    i++;
}

I need that appInfo array be unique in sTitle and sRelativePath members the others members can be duplicates

EDIT:

Thanks to all for the answers but this application is "portable" I mean I just need the .exe file and I don't want to add another files like references *.dll so please no external references this app is intended to use in a pendrive

All data comes form a *.ini file what I do is: (pseudocode)

ReadFile()
FillDataFromFileInAppInfoArray()
DeleteDuplicates()
FillListViewControl()

When I want to save that data into a file I have these options:

  1. Using ListView data
  2. Using appInfo array (this is more faster¿?)
  3. Any other¿?

EDIT2:

Big thanks to: Jon Skeet, Michael Hays thanks for your time guys!!


Firstly, please don't use mutable structs. They're a bad idea in all kinds of ways.

Secondly, please don't use public fields. Fields should be an implementation detail - use properties.

Thirdly, it's not at all clear to me that this should be a struct. It looks rather large, and not particularly "a single value".

Fourthly, please follow the .NET naming conventions so your code fits in with all the rest of the code written in .NET.

Fifthly, you can't remove items from an array, as arrays are created with a fixed size... but you can create a new array with only unique elements.

LINQ to Objects will let you do that already using GroupBy as shown by Albin, but a slightly neater (in my view) approach is to use DistinctBy from MoreLINQ:

var unique = appInfo.DistinctBy(x => new { x.sTitle, x.sRelativePath })
                    .ToArray();

This is generally more efficient than GroupBy, and also more elegant in my view.

Personally I generally prefer using List<T> over arrays, but the above will create an array for you.

Note that with this code there can still be two items with the same title, and there can still be two items with the same relative path - there just can't be two items with the same relative path and title. If there are duplicate items, DistinctBy will always yield the first such item from the input sequence.

EDIT: Just to satisfy Michael, you don't actually need to create an array to start with, or create an array afterwards if you don't need it:

var query = listView1.Items
                     .Cast<ListViewItem>()
                     .Select(item => new stAppInfo
                             {
                                 sTitle = item.Text,
                                 sRelativePath = item.SubItems[1].Text,
                                 bFindInstalled = item.SubItems[3].Text == "Sí",
                                 sFindTitle = item.SubItems[4].Text,
                                 sFindVersion = item.SubItems[5].Text,
                                 bChecked = item.SubItems[6].Text == "Sí"
                             })
                     .DistinctBy(x => new { x.sTitle, x.sRelativePath });

That will give you an IEnumerable<appInfo> which is lazily streamed. Note that if you iterate over it more than once, however, it will iterate over listView1.Items the same number of times, performing the same uniqueness comparisons each time.

I prefer this approach over Michael's as it makes the "distinct by" columns very clear in semantic meaning, and removes the repetition of the code used to extract those columns from a ListViewItem. Yes, it involves building more objects, but I prefer clarity over efficiency until benchmarking has proved that the more efficient code is actually required.


What you need is a Set. It ensures that the items entered into it are unique (based on some qualifier which you will set up). Here is how it is done:

  • First, change your struct to a class. There is really no getting around that.

  • Second, provide an implementation of IEqualityComparer<stAppInfo>. It may be a hassle, but it is the thing that makes your set work (which we'll see in a moment):

    public class AppInfoComparer : IEqualityComparer<stAppInfo>
    {
        public bool Equals(stAppInfo x, stAppInfo y) {
            if (ReferenceEquals(x, y)) return true;
            if (x == null || y == null) return false;
            return Equals(x.sTitle, y.sTitle) && Equals(x.sRelativePath,
               y.sRelativePath);
        }
    
        // this part is a pain, but this one is already written 
        // specifically for your question.
        public int GetHashCode(stAppInfo obj) {
            unchecked {
                return ((obj.sTitle != null 
                    ? obj.sTitle.GetHashCode() : 0) * 397)
                    ^ (obj.sRelativePath != null 
                    ? obj.sRelativePath.GetHashCode() : 0);
            }
        }
    }
    
  • Then, when it is time to make your set, do this:

    var appInfoSet = new HashSet<stAppInfo>(new AppInfoComparer());
    foreach (ListViewItem item in listView1.Items)
    {
        var newItem = new stAppInfo { 
            sTitle = item.Text,
            sRelativePath = item.SubItems[1].Text,
            sCmdLine = item.SubItems[2].Text,
            bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false,
            sFindTitle = item.SubItems[4].Text,
            sFindVersion = item.SubItems[5].Text,
            bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false};
        appInfoSet.Add(newItem);
    }
    
  • appInfoSet now contains a collection of stAppInfo objects with unique Title/Path combinations, as per your requirement. If you must have an array, do this:

    stAppInfo[] appInfo = appInfoSet.ToArray();
    

Note: I chose this implementation because it looks like the way you are already doing things. It has an easy to read for-loop (though I do not need the counter variable). It does not involve LINQ (wich can be troublesome if you aren't familiar with it). It requires no external libraries outside of what .NET framework provides to you. And finally, it provides an array just like you've asked. As for reading the file in from an INI file, hopefully you see that the only thing that will change is your foreach loop.

Update

Hash codes can be a pain. You might have been wondering why you need to compute them at all. After all, couldn't you just compare the values of the title and relative path after each insert? Well sure, of course you could, and that's exactly how another set, called SortedSet works. SortedSet makes you implement IComparer in the same way that I implemented IEqualityComparer above.

So, in this case, AppInfoComparer would look like this:

private class AppInfoComparer : IComparer<stAppInfo>
{
   // return -1 if x < y, 1 if x > y, or 0 if they are equal
   public int Compare(stAppInfo x, stAppInfo y)
   {
      var comparison = x.sTitle.CompareTo(y.sTitle);
      if (comparison != 0) return comparison;
      return x.sRelativePath.CompareTo(y.sRelativePath);
   }
}

And then the only other change you need to make is to use SortedSet instead of HashSet:

var appInfoSet = new SortedSet<stAppInfo>(new AppInfoComparer());

It's so much easier in fact, that you are probably wondering what gives? The reason that most people choose HashSet over SortedSet is performance. But you should balance that with how much you actually care, since you'll be maintaining that code. I personally use a tool called Resharper, which is available for Visual Studio, and it computes these hash functions for me, because I think computing them is a pain, too.

(I'll talk about the complexity of the two approaches, but if you already know it, or are not interested, feel free to skip it.)

SortedSet has a complexity of O(log n), that is to say, each time you enter a new item, will effectively go the halfway point of your set and compare. If it doesn't find your entry, it will go to the halfway point between its last guess and the group to the left or right of that guess, quickly whittling down the places for your element to hide. For a million entries, this takes about 20 attempts. Not bad at all. But, if you've chosen a good hashing function, then HashSet can do the same job, on average, in one comparison, which is O(1). And before you think 20 is not really that big a deal compared to 1 (after all computers are pretty quick), remember that you had to insert those million items, so while HashSet took about a million attempts to build that set up, SortedSet took several million attempts. But there is a price -- HashSet breaks down (very badly) if you choose a poor hashing function. If the numbers for lots of items are unique, then they will collide in the HashSet, which will then have to try again and again. If lots of items collide with the exact same number, then they will retrace each others steps, and you will be waiting a long time. The millionth entry will take a million times a million attempts -- HashSet has devolved into O(n^2). What's important with those big-O notations (which is what O(1), O(log n), and O(n^2) are, in fact) is how quickly the number in parentheses grows as you increase n. Slow growth or no growth is best. Quick growth is sometimes unavoidable. For a dozen or even a hundred items, the difference may be negligible -- but if you can get in the habit of programming efficient functions as easily as alternatives, then it's worth conditioning yourself to do so as problems are cheapest to correct closest to the point where you created that problem.


Use LINQ2Objects, group by the things that should be unique and then select the first item in each group.

var noDupes = appInfo.GroupBy(
    x => new { x.sTitle, x.sRelativePath })
    .Select(g => g.First()).ToArray();


!!! Array of structs (value type) + sorting or any kind of search ==> a lot of unboxing operations.

  1. I would suggest to stick with recommendations of Jon and Henk, so make it as a class and use generic List<T>.
  2. Use LINQ GroupBy or DistinctBy, as for me it is much simple to use built in GroupBy, but it also interesting to take a look at an other popular library, perhaps it gives you some insights.

BTW, Also take a look at the LambdaComparer it will make you life easier each time you need such kind of in place sorting/search, etc...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜