开发者

Using lists and dictionaries to store temporary information

I will have alot of similar objects with similar parameters. Example of an object parameters would be something like :

name, boolean, number and list.

The name must be unique value among all the objects while values for boolean, number and list parameters must not.

I could store the data as list of dictionaries i guess. Like that:

list = [
  {'name':'a', 'bool':true, 'number':123, 'list':[1,开发者_如何学Python 2, 3]},
  {'name':'b', 'bool':false, 'number':143, 'list':[1, 3, 5]},
  {'name':'c', 'bool':false, 'number':123, 'list':[1, 4, 5, 18]},
]

What would be the fastest way to check if the unique name exists in the list of dictionaries, before i create another dictionary in that list? Do i have to loop through the list and check what is the value of list[i][name]? What would be fastest and least memory conserving to hold and process that information, assuming, that different similar lists might be simultanously processed in different threads/tasks and that their size could be anywhere between 100 to 100 000 dictionaries per list. Should i store those lists in database instead of memory?

I understand that perhaps i should not be thinking about optimizing (storing the info and threads) before the project is working, so please, answer the unique name lookup question first :)

Thanks, Alan


If the name is the actual (unique) identifier of each inner data, you could just use a dictionary for the outer data as well:

data = {
  'a' : { 'bool':true, 'number':123, 'list':[1, 2, 3] },
  'b' : { 'bool':false, 'number':143, 'list':[1, 3, 5] },
  'c' : { 'bool':false, 'number':123, 'list':[1, 4, 5, 18] },
}

Then you could easily check if the key exists or not.

Btw. don't name your variables list or dict as that will overwrite the built-in objects.


once you come around to using a dict instead of a list, the fastest way to perform the check that you want is:

if 'newkey' not in items:
    # create a new record

since you want to be able to access these records from multiple threads, I would keep a collection of locks. BTW, this is the sort of thing that you design in the beginning as it's part of the application design and not an optimization.

class DictLock(dict):
    def __init__(self):
        self._lock = threading.Lock()

    def __getitem__(self, key):
        # lock to prevent two threads trying to create the same
        # entry at the same time. Then they would get different locks and
        # both think that they could access the key guarded by that lock
        with self._lock:
            if key not in self.iterkeys():
                self[key] = threading.Lock()
            return super(DictLock, self).__getitem__(key)

now if you want to modify your items, you can use the locks to keep it safe.

locks = DictLock()

with locks['a']:
    # modify a.

or to insert a new element

with locks['z']:
    #we are now the only ones (playing by the rules) accessing the 'z' key
    items['z'] = create_new_item()


What you want is an "intrusive" dictionary - something that looks for keys inside values. Unfortunately, I don't know of any implementation in Python. Boost's multi_index comes close.


If you don't want to change the data structure you have, then you can use the following. Otherwise, poke's answer is the way to go.

>>> my_list = [
...   {'name':'a', 'bool':True, 'number':123, 'list':[1, 2, 3]},
...   {'name':'b', 'bool':False, 'number':143, 'list':[1, 3, 5]},
...   {'name':'c', 'bool':False, 'number':123, 'list':[1, 4, 5, 18]},
... ]
>>> def is_present(data, name):
...     return any(name == d["name"] for d in data)
... 
>>> is_present(my_list, "a")
True
>>> is_present(my_list, "b")
True
>>> is_present(my_list, "c")
True
>>> is_present(my_list, "d")
False

If you pass any an iterable, it returns True if any one of its elements are True.

(name == d["name"] for d in data) creates a generator. Each time somebody (in this case, any) requests the next element, it does so by getting the next element, d, from data and transforms it by the expression name == d["name"]. Since generators are lazy i.e. the transformation is done when the next element is requested, this should use relatively little memory (and should use the same amount of memory regardless of the size of the list).


Store the objects in a dictionary with the name as the key:

objects = {'a' : {'bool':true, 'number':123, 'list':[1, 2, 3]},
           'b' : {'bool':false, 'number':143, 'list':[1, 3, 5]},
           'c' : {'bool':false, 'number':123, 'list':[1, 4, 5, 18]}}

This way you ensure that the names are unique since all the keys in the dictionary are unique. Checking is a name is in the dictionary is also easy:

name in objects
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜