C# List<> Add() method performance
I am working with a List<> collection, adding new objects to the collection inside 2 nested loops. There are some 500000 items added to the collection, after the loops finish to execute.
At first, the addition operation runs well, but soon after there can be noticed a decrease in performance, for the last thousands of elements, the delay time is unbearable.
I have tried various tricks (initializing the collection with a certain size - 500000), replacing the List<> with a LinkedList<> collection, but it didn't help too much.
Can you recommend me a tip to solve the problem? I am interesting in changing the structure with a more optimized one - LinkedList<> for instance performs better than List<> with operations such as addition.
Method which updates the list
private void UpdateForecastList(ConcurrentDictionary<Int32, RegistroSalidaProductoPrevision开发者_JS百科> prediccion, bool soloMejoresMetodos = true)
{
foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
{
KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;
IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();
Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();
if (pExistente.Count > 0)
{
foreach (var p in pExistente)
{
prediccionList.Remove(p);
}
}
if (kvp.Value.Previsiones.Count > 0)
{
var previsiones = kvp.Value.Previsiones.Where(prevision => prevision.Value.LPrevision[1] != null).ToList();
int previsionesCount = previsiones.Count;
for (int a = 0; a < previsionesCount; a++)
{
var registros = previsiones[a].Value.LPrevision[1].Serie;
int c = registros.Count;
if (soloMejoresMetodos)
{
if (localKvp.Value.MejorMetodo != previsiones[a].Key) continue;
for (int i = 0; i < c; i++)
{
var p = new Prediccion()
{
Id = articulo.Id,
Nombre = articulo.Codigo,
Descripcion = articulo.Descripcion,
NombreMetodo =
Utils.SplitStringByCapitals(previsiones[a].Value.NombreMetodo),
Fecha = registros[i].Fecha,
PrediccionArticulo = Math.Round(registros[i].Cantidad, 2),
EsMejorMetodo =
(previsiones[a].Value.NombreMetodo == localKvp.Value.MejorMetodo)
? true
: false
};
// This line experiences performance loss
prediccionList.Add(p);
}
}
else
{
for (int i = 0; i < c; i++)
{
prediccionList.Add(new Prediccion()
{
Id = articulo.Id,
Nombre = articulo.Codigo,
Descripcion = articulo.Descripcion,
NombreMetodo = previsiones[a].Value.NombreMetodo,
Fecha = registros[i].Fecha,
PrediccionArticulo =
Math.Round(registros[i].Cantidad, 2),
EsMejorMetodo =
(previsiones[a].Value.NombreMetodo ==
localKvp.Value.MejorMetodo)
? true
: false
});
}
}
}
}
else
{
prediccionList.Add(new Prediccion()
{
Id = articulo.Id,
Nombre = articulo.Codigo,
Descripcion = articulo.Descripcion,
NombreMetodo = kvp.Value.ErroresDatos[0].Texto,
});
}
}
}
Small description of the method: - the method reads an object (a concurrent dictionary) and updates a list (in this case a LinkedList) with the forecasts corresponding to a certain article.
The concurrent dictionary object is constantly updated from various threads that access it concurrently.
The list is initialized with null predictions corresponding to all the articles; thus, for instance, if you have 700 articles, in the beginning the list will be populated with 700 blank forecasts.
As the concurrent dictionary is updated by one of the computing threads, it raises an event which calls the method mentioned above, which at its turn, updates the list (prediccionList).
The maximum number of records which could be hold in the prediccionList (in this case) is about 500000 records, but the loss in performance could be noticed after adding some 40000 records in the list.
The code might seem a bit rusty, as I have tried various optimizations tricks (replace the foreach'es with for's, calculate the count's outside the loops, replace the List<> object with a LinkedList<> etc.). Finally I reached the conclusion that the part that slows down the execution time is the line "prediccionList.Add(p);".
The objects that are added to the list are instances of the Prediccion class; this object I consider not to be very heavy, it only contains 7 fields.
Memory usage
I attach the result from a memory profiling. The memory used doesn't surpass 256 MB, thus I don't believe the memory should be a problem here.
The problem has nothing to do with the performance of List
or any other .NET data structure. Your problem is purely algorithmic. For example, you have this code fragment:
foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
{
KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;
IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();
Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();
So for every item in the dictionary (prediccion
), you're iterating over the entire prediccionList
. You've implemented an n^2 algorithm. The amount of time it takes to execute that method is proportional to prediccion.Count * prediccionList.Count
.
You need a better algorithm; not a faster collection data structure.
In my experience, the List<T>
performance is memory-dependant. It always follows the same pattern, the inserting is fast up to a point, and then the performance sharply drops.
On my machine that usually happens when I hit the 1.2G memory mark. I've had the same issue with almost all collections I've tried, so I think it's more of a .net underlying issue than a List<T>
issue.
I would recommend to try to reduce the object size of the things you are using 500.000 of (replace long
s with int
s, etc...) and try then.
But beware, even if you manage it to work fast on your machine, it might be over the threshold of the machine where the app is deployed.
If the objects that you're adding to the list are of any significant size, you could be suffering from memory constraints.
If your process is 32-bit, you're going to be constrained to 2GB in total before running out of address space, but if it's 64-bit, you could easily exceed the physical memory in the machine and start paging to disk.
How big are your objects?
As your list grows larger, every time when it is expanding collecting garbage, framework is copying its contents to a new list location, because of the way how garbage collector works. That is why it becomes slower and slower as it is becoming larger. (GC on MSDN)
Possible solutions (that I can think of) are using a list or array with predefined size that you are sure it wont fill up, or if that is not an option, then using System.Collections.Generic.LinkedList, but as you have already tried that, you may have to implement custom list, single-linked if applicable (LinkedList is double-linked).
To increase chance of getting good answer, you should post code of object you keep in collection, and part where you add items, so we can better understand what is all about.
Also, please take a look at http://www.simple-talk.com/dotnet/performance/the-top-5-.net-memory-management-misconceptions/, I think that it will be of help to you.
UPDATE: Indexing should be cheap operation, but nevertheless, you may try to read previsiones[a] (and registros[i] in nested loop) into local variable on beginning of loop, you will save couple of indexings (x 100000 iterations, could make some difference, if clr is not optimizing this?).
Using a Struct instead of a Class can improve your performance significantly.
You can also gain performance by losing your String Properties from the Prediccion Class/Struct.
I was interested in the actual impact for a long time, so here is my Benchmark:
I took different data structures and put 20 Million objects/structs in them. Here is the result:
List: Adding 20000000 TestClass to a List`1 took 3563,2068 ms Accessing 20000000 TestClass from a List took 103,0203 ms Adding 20000000 TestStruct to a List`1 took 2239,9639 ms Accessing 20000000 TestStruct from a List took 254,3245 ms Initialized List: Adding 20000000 TestClass to a List`1 took 3774,772 ms Accessing 20000000 TestClass from a List took 99,0548 ms Adding 20000000 TestStruct to a List`1 took 1520,7765 ms Accessing 20000000 TestStruct from a List took 257,5064 ms LinkedList: Adding 20000000 TestClass to a LinkedList`1 took 6085,6478 ms Adding 20000000 TestStruct to a LinkedList`1 took 7771,2243 ms HashSet: Adding 20000000 TestClass to a HashSet`1 took 10816,8488 ms Adding 20000000 TestStruct to a HashSet`1 took 3694,5187 ms Now I added a string to the class/struct: List: Adding 20000000 TestClassWithString to a List`1 took 4925,1215 ms Accessing 20000000 TestClassWithString from a List took 120,0348 ms Adding 20000000 TestStructWithString to a List`1 took 3554,7463 ms Accessing 20000000 TestStructWithString from a List took 456,3299 ms
This is my test program:
static void Main(string[] args)
{
const int noObjects = 20*1000*1000;
Console.WriteLine("List:");
RunTest(new List<TestClass>(), noObjects);
RunTest(new List<TestStruct>(), noObjects);
Console.WriteLine();
Console.WriteLine("Initialized List:");
RunTest(new List<TestClass>(noObjects), noObjects);
RunTest(new List<TestStruct>(noObjects), noObjects);
Console.WriteLine();
Console.WriteLine("LinkedList:");
RunTest(new LinkedList<TestClass>(), noObjects);
RunTest(new LinkedList<TestStruct>(), noObjects);
Console.WriteLine();
Console.WriteLine("HashSet:");
RunTest(new HashSet<TestClass>(), noObjects);
RunTest(new HashSet<TestStruct>(), noObjects);
Console.WriteLine();
Console.WriteLine("Now I added a string to the class/struct:");
Console.WriteLine("List:");
RunTest(new List<TestClassWithString>(), noObjects);
RunTest(new List<TestStructWithString>(), noObjects);
Console.WriteLine();
Console.ReadLine();
}
private static void RunTest<T>(ICollection<T> collection, int noObjects) where T : ITestThing
{
Stopwatch sw = new Stopwatch();
sw.Restart();
for (int i = 0; i < noObjects; i++)
{
var obj = Activator.CreateInstance<T>();
obj.Initialize();
collection.Add(obj);
}
sw.Stop();
Console.WriteLine("Adding " + noObjects + " " + typeof(T).Name + " to a " + collection.GetType().Name + " took " + sw.Elapsed.TotalMilliseconds + " ms");
if (collection is IList)
{
IList list = (IList) collection;
// access all list objects
sw.Restart();
for (int i = 0; i < noObjects; i++)
{
var obj = list[i];
}
sw.Stop();
Console.WriteLine("Accessing " + noObjects + " " + typeof (T).Name + " from a List took " + sw.Elapsed.TotalMilliseconds + " ms");
}
}
TestClass and TestStruct both look like this (one with 'class', one with 'struct'):
public class TestClass : ITestThing
{
public int I1;
public int I2;
public double D1;
public double D2;
public long L1;
public long L2;
public void Initialize()
{
D1 = 1;
D2 = 2;
I1 = 3;
I2 = 4;
L1 = 5;
L2 = 6;
}
}
Only TestStruct is public struct
instead of public class
and TestClassWithString and TestStructWithString public string S1
which is initialized with "abc".
ITestThing is just there because structs cannot have a Constructor, so I needed some way to call the Initialize() method in a generic way, but it turns out it doesn't make much of a difference if I call Initialize() or not.
Please note that the differences in the durations would be even more extreme if I had written the code plain for every test case without any Interface or Activator.CreateInstance, but that code grew too big as soon as I added the 2nd test case...
SUMMARY
You can greatly improve your performance by using a List with initial size and put Structs in it, not class instances (Objects). Also try to avoid having Strings in your Structs, because every String instance is again an object which you tried to avoid by using a Struct instead of an Object.
How about using an array instead of a List
? You could initialize it to have an initial size (let's say 500000 elements) and if that's not enough, use Array.Resize
to add another 100000. You just have to keep track of the actual number of elements, as the Length
property will only give you the number of elements.
Note, however, that the Array.Resize
call may also be time consuming, as basically, a new array of the new size will be generated and all elements from the original array will be copied into the new array. You should not call this too often.
Have you tried giving a capacity on initialization. So it won't need to re-allocate memory and carry old contents to new memory space.
List<long> thelist = new List<long>(500000);
You can use an array which will be faster (but not queryable). I don't know the specifics of your code but you may want to refractor and use a database. 500000 items will never be fast
精彩评论