开发者

Large static arrays are slowing down class load, need a better/faster lookup method

I have a class with a couple static arrays:

an int[] with 17,720 elements

a string[] with 17,720 elements

I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it.

Specifically, it's a lookup for Unicode character names. The first array is an index into the second array.

static readonly int[] NAME_INDEX = {

0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...

static readonly string[] NAMES = {

"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ... 开发者_如何转开发

The following code is how the arrays are used (given a character code). [Note: This code isn't a performance problem]

int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);

if (nameIndex > 0) { return NAMES[nameIndex]; }

I guess I'm looking at other options on how to structure the data so that 1) The class is quickly loaded, and 2) I can quickly get the "name" for a given character code.

Should I not be storing all these thousands of elements in static arrays?

Update

Thanks for all the suggestions. I've tested out a Dictionary approach and the performance of adding all the entries seems to be really poor.

Here is some code with the Unicode data to test out Arrays vs Dictionaries http://drop.io/fontspace/asset/fontspace-unicodesupport-zip

Solution Update

I tested out my original dual arrays (which are faster than both dictionary options) with a background thread to initialize and that helped performance a bit.

However, the real surprise is how well the binary files in resource streams works. It is the fastest solution discussed in this thread. Thanks everyone for your answers!


So a couple of observations. Binary Search is only going to work if your array is sorted, and from your above code snippet, it doesn't look to be sorted.

Since your primary goal is to find a specific name, your code is begging for a hash table. I would suggest using a Dictionary, it will give you O(1) (on average) lookup, without much more overhead than just having the arrays.

As for the load time, I agree with Andrey that the best way is going to be by using a separate thread. You are going to have some initialization overhead when using the amount of data you are using. Normal practice with GUIs is to use a separate thread for these activites so you don't lock up the UI.


First

A Dictionary<int, string> is going to perform far better than your duelling arrays will. Putting aside how this data gets into the arrays/Dictionary (hardcoded vs. read in from another location, like a resource file), this is still a better and more intuitive storage mechanism

Second

As others have suggested, do your loading in another thread. I'd use a helper function to help you deal with this. You could use an approach like this:

public class YourClass
{
    private static Dictionary<int, string> characterLookup;
    private static ManualResetEvent lookupCreated;

    static YourClass()
    {
        lookupCreated = new ManualResetEvent(false);

        ThreadPool.QueueUserWorkItem(LoadLookup);
    }

    static void LoadLookup(object garbage)
    {
        // add your pairs by calling characterLookup.Add(...)

        lookupCreated.Set();
    }

    public static string GetDescription(int code)
    {
        if (lookupCreated != null)
        {
            lookupCreated.WaitOne();
            lookupCreated.Close();
            lookupCreated = null;
        }

        string output;

        if(!characterLookup.TryGetValue(code, out output)) output = null;

        return output;
    }
}

In your code, call GetDescription in order to translate your integer into the corresponding string. If the UI doesn't call this until later, then you should see a marked decrease in startup time. To be safe, though, I've included a ManualResetEvent that will cause any calls to GetDescription to block until the dictionary has been fully loaded.


"Should I not be storing all these thousands of elements in static arrays?"

A much better way would be to store your data as binary stream in resources in the assembly and then load from the resources. Will be some more programming overhead but therefore doesn't need any object initialization.

Basic idea would be (no real code):

// Load data (two streams):
indices = ResourceManager.GetStream ("indexData");
strings = ResourceManager.GetStream ("stringData");

// Retrieving an entry:
stringIndex = indices.GetIndexAtPosition (char);
string = strings.GetStringFromPosition (stringIndex);

If you want a really good solution (for even some more work) look into using memmapped data files.


  1. Initialize your arrays in separate thread that will not lock the UI
  2. http://msdn.microsoft.com/en-us/library/hz49h034.aspx


if you store the arrays in a file you could do a lazy load

public class Class1
    {
        const int CountOfEntries = 17700; //or what ever the count is
        IEnumerable<KeyValuePair<int, string>> load()
        {
            using (var reader = File.OpenText("somefile"))
            {
                while (!reader.EndOfStream)
                {
                    var line = reader.ReadLine();
                    var pair = line.Split(',');
                    yield return new KeyValuePair<int, string>(int.Parse(pair[0]), pair[1]);
                }
            }
        }

        private static Dictionary<int, string> _lookup = new Dictionary<int, string>();
        private static IEnumerator<KeyValuePair<int, string>> _loader = null;
        private string LookUp(int index)
        {

            if (_lookup.Count < CountOfEntries && !_lookup.ContainsKey(index))
            {
                if(_loader == null)
                {
                    _loader = load().GetEnumerator();
                }
                while(_loader.MoveNext())
                {
                    var pair = _loader.Current;
                    _lookup.Add(pair.Key,pair.Value);
                    if (pair.Key == index)
                    {
                        return index;
                    }
                }
            }
            string name;
            if (_lookup.TryGetValue(index,out name))
            {
                return return name;
            }
            throw new KeyNotFoundException("The given index was not found");
        }
    }

the code expectes the file to have one pair on each line like so: index0,name0 index1,name1

If the first index sought is at the end this will perform slower probably (due to IO mainly) but if the access is random the average case woul be reading half of the values the first time if the access is not random make sure to keep the most used in the top of the file

there are a few more issues to considere. The above code is not threadsafe for the load operation and to increase responsiveness of the rest of the code keep the loading in a background thread

hope this helps


What about using a dictionary instead of two arrays? You could initialize the dictionary asynchronously using a thread or thread pool. The lookup would be O(1) instead of O(log(n)) as well.

public static class Lookup
{
  private static readonly ManualResetEvent m_Initialized = new ManualResetEvent(false);
  private static readonly Dictionary<int, string> m_Dictionary = new Dictionary<int, string>();

  public static Lookup()
  {
    // Start an asynchronous operation to intialize the dictionary.
    // You could use ThreadPool.QueueUserWorkItem instead of creating a new thread.
    Thread thread = new Thread(() => { Initialize(); });
    thread.Start();
  }

  public static string Lookup(int code)
  {
    m_Initialized.WaitOne();
    lock (m_Dictionary)
    {
      return m_Dictionary[code];
    }
  }

  private static void Initialize()
  {
    lock (m_Dictionary)
    {
      m_Dictionary.Add(0x0000, "Exclamation Point");
      // Keep adding items to the dictionary here.
    }
    m_Initialized.Set();
  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜