How can I make my function run as fast as "Contains" on an ArrayList?

2022-12-21 02:48 问答作者：

I can't figure out a discrepancy between the time it takes for the Contains method to find an element in an ArrayList and the time it takes for a small function that I wrote to do the same thing. The documentation states that Contains performs a linear search, so it's supposed to be in O(n) and not any other faster method. However, while the exact values may not be relevant, the Contains method returns in 00:00:00.1087087 seconds while my function takes 00:00:00.1876165. It might not be much, but this difference becomes more evident when dealing with even larger arrays. What am I missing and how should I write my function to match Contains's performances?

I'm using C# on .NET 3.5.

public partial class Window1 : Window
{
    public bool DoesContain(ArrayList list, object element)
    {
        for (int i = 0; i < list.Count; i++)
            if (list[i].Equals(element)) return true;

        return false;
    }

    public Window1()
    {
        InitializeComponent();

        ArrayList list = new ArrayList();
        for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);

        Stopwatch sw = new Stopwatch();
        sw.Start();

        //Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed);
        Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed);
    }
}

EDIT:

Okay, now, lads, look:

public partial class Window1 : Window
{
    public bool DoesContain(ArrayList list, object element)
    {
        int count = list.Count;
        for (int i = count - 1; i >= 0; i--)
            if (element.Equals(list[i])) return true;

        return false;
    }


    public bool DoesContain1(ArrayList list, object element)
    {
        int count = list.Count;
        for (int i = 0; i < count; i++)
            if (element.Equals(list[i])) return true;

        return false;
    }

    public Window1()
    {
        InitializeComponent();

        ArrayList list = new ArrayList();
        for (int i = 0; i < 10000000; i++) list.Add("zzz " + i);

        Stopwatch sw = new Stopwatch();
        long total = 0;
        int nr = 100;

        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
            DoesContain(list,"zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);


        total = 0;
        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
            DoesContain1(list, "zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);


        total = 0;
        for (int i = 0; i < nr; i++)
        {
            sw.Reset();
            sw.Start();
       开发者_StackOverflow中文版     list.Contains("zzz");
            total += sw.ElapsedMilliseconds;
        }
        Console.Out.WriteLine(total / nr);
    }
  }

I made an average of 100 running times for two versions of my function(forward and backward loop) and for the default Contains function. The times I've got are 136 and 133 milliseconds for my functions and a distant winner of 87 for the Contains version. Well now, if before you could argue that the data was scarce and I based my conclusions on a first, isolated run, what do you say about this test? Not does only on average Contains perform better, but it achieves consistently better results in each run. So, is there some kind of disadvantage in here for 3rd party functions, or what?

First, you're not running it many times and comparing averages.

Second, your method isn't being jitted until it actually runs. So the just in time compile time is added into its execution time.

A true test would run each multiple times and average the results (any number of things could cause one or the other to be slower for run X out of a total of Y), and your assemblies should be pre-jitted using ngen.exe.

As you're using .NET 3.5, why are you using ArrayList to start with, rather than List<string>?

A few things to try:

You could see whether using foreach instead of a for loop helps

You could cache the count:

public bool DoesContain(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
    {
        if (list[i].Equals(element))
        {
            return true;
        }
        return false;
    }
}

You could reverse the comparison:
```
if (element.Equals(list[i]))
```

While I don't expect any of these to make a significant (positive) difference, they're the next things I'd try.

Do you need to do this containment test more than once? If so, you might want to build a HashSet<T> and use that repeatedly.

I'm not sure if you're allowed to post Reflector code, but if you open the method using Reflector, you can see that's it's essentially the same (there are some optimizations for null values, but your test harness doesn't include nulls).

The only difference that I can see is that calling list[i] does bounds checking on i whereas the Contains method does not.

Using the code below I was able to get the following timings relatively consitently (within a few ms):
1: 190ms DoesContainRev
2: 198ms DoesContainRev1
3: 188ms DoesContainFwd
4: 203ms DoesContainFwd1
5: 199ms Contains

Several things to notice here.

This is run with release compiled code from the commandline. Many people make the mistake of benchmarking code inside the Visual Studio debugging environment, not to say anyone here did but something to be careful of.

The list[i].Equals(element) appears to be just a bit slower than element.Equals(list[i]).

using System;
using System.Diagnostics;
using System.Collections;


namespace ArrayListBenchmark
{
class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        const int arrayCount = 10000000;
        ArrayList list = new ArrayList(arrayCount);
        for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i);
    sw.Start();
    DoesContainRev(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainRev1(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainFwd(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    DoesContainFwd1(list, "zzz");
    sw.Stop();
    Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    sw.Start();
    list.Contains("zzz");
    sw.Stop();
    Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds));
    sw.Reset();

    Console.ReadKey();
}
public static bool DoesContainRev(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (element.Equals(list[i])) return true;

    return false;
}
public static bool DoesContainFwd(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (element.Equals(list[i])) return true;

    return false;
}
public static bool DoesContainRev1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = count - 1; i >= 0; i--)
        if (list[i].Equals(element)) return true;

    return false;
}
public static bool DoesContainFwd1(ArrayList list, object element)
{
    int count = list.Count;
    for (int i = 0; i < count; i++)
        if (list[i].Equals(element)) return true;

    return false;
}
         }
        }

With a really good optimizer there should not be difference at all, because the semantics seems to be the same. However the existing optimizer can optimize your function not so good as the hardcoded Contains is optimized. Some of the points for optimization:

comparing to a property each time can be slower than counting downwards and comparing against 0
function call itself has its performance penalty
using iterators instead of explicit indexing can be faster (foreach loop instead of plain for)

First, if you are using types you know ahead of time, I'd suggest using generics. So List instead of ArrayList. Underneath the hood, ArrayList.Contains actually does a bit more than what you are doing. The following is from reflector:

public virtual bool Contains(object item)
{
    if (item == null)
    {
        for (int j = 0; j < this._size; j++)
        {
            if (this._items[j] == null)
            {
                return true;
            }
        }
        return false;
    }
    for (int i = 0; i < this._size; i++)
    {
        if ((this._items[i] != null) && this._items[i].Equals(item))
        {
            return true;
        }
    }
    return false;
}

Notice that it forks itself on being passed a null value for item. However, since all the values in your example are not null, the additional check on null at the beginning and in the second loop should in theory take longer.

Are you positive you are dealing with fully compiled code? I.e., when your code runs the first time it gets JIT compiled where as the framework is obviously already compiled.

After your Edit, I copied the code and made a few improvements to it.
The difference was not reproducable, it turns out to be a measuring/rounding issue.

To see that, change your runs to this form:

    sw.Reset();
    sw.Start();
    for (int i = 0; i < nr; i++)
    {          
        DoesContain(list,"zzz");            
    }
    total += sw.ElapsedMilliseconds;
    Console.WriteLine(total / nr);

I just moved some lines. The JIT issue was insignificant with this numbr of repetitions.

My guess would be that ArrayList is written in C++ and could be taking advantage of some micro-optimizations (note: this is a guess).

For instance, in C++ you can use pointer arithmetic (specifically incrementing a pointer to iterate an array) to be faster than using an index.

using an array structure, you can't search faster than O(n) whithout any additional information. if you know that the array is sorted, then you can use binary search algorithm and spent only o(log(n)) otherwise you should use a set.

Revised after reading comments:

It does not use some Hash-alogorithm to enable fast lookup.

Use SortedList<TKey,TValue>, Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for fast access based on a key.

var list = new List<myObject>(); // Search is sequential
var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast
var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast

KeyedCollection<TKey, TValue> is also fast and allows indexed lookup, however, it needs to be inherited as it is abstract. Therefore, you need a specific collection. However, with the following you can create a generic KeyedCollection.

public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> {
   public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) {
      this.keyExtractor = keyExtractor;
   }

   private Func<TValue, TKey> keyExtractor;

   protected override TKey GetKeyForItem(TValue value) {
      return this.keyExtractor(value);
   }
}

The advantage of using the KeyedCollection is that the Add method does not require that a key is specified.

继续阅读：arraylist contains

How can I make my function run as fast as "Contains" on an ArrayList?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？