开发者

Optimised Python dictionary / negative index storage

Raised by this question's comments (I can see that this is irrelevant), I am now aware that using dictionaries for data that needs to be queried/accessed regularly is not good, speedwise.

I have a situation of something like th开发者_如何学JAVAis:

someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else

I am storing keys of coordinates to objects that act as arrays of tiles in a game. These are going to be negative at some point, so I can't use a list or some kind of sparse array (I think that's the term?).

Can I either:

  • Speed up dictionary lookups, so this would not be an issue
  • Find some kind of container that will support sparse, negative indices?

I would use a list, but then the querying would go from O(log n) to O(n) to find the area at (x, y). (I think my timings are off here too).


Python dictionaries are very very fast, and using a tuple of integers is not going to be a problem. However your use case seems that sometimes you need to do a single-coordinate check and doing that traversing all the dict is of course slow.

Instead of doing a linear search you can however speed up the data structure for the access you need using three dictionaries:

class Grid(object):
    def __init__(self):
        self.data = {}  # (i, j) -> data
        self.cols = {}  # i -> set of j
        self.rows = {}  # j -> set of i

    def __getitem__(self, ij):
        return self.data[ij]

    def __setitem__(self, ij, value):
        i, j = ij
        self.data[ij] = value
        try:
            self.cols[i].add(j)
        except KeyError:
            self.cols[i] = set([j])
        try:
            self.rows[j].add(i)
        except KeyError:
            self.rows[j] = add([i])

    def getRow(self, i):
        return [(i, j, data[(i, j)])
                for j in self.cols.get(i, [])]

    def getCol(self, j):
        return [(i, j, data[(i, j)])
                for i in self.rows.get(j, [])]

Note that there are many other possible data structures depending on exactly what you are trying to do, how frequent is reading, how frequent is updating, if you query by rectangles, if you look for nearest non-empty cell and so on.


To start off with

Speed up dictionary lookups, so this would not be an issue

Dictionary lookups are pretty fast O(1), but (from your other question) you're not relying on the hash-table lookup of the dictionary, your relying on a linear search of the dictionary's keys.

Find some kind of container that will support sparse, negative indices?

This isn't indexing into the dictionary. A tuple is an immutable object, and you are hashing the tuple as a whole. The dictionary really has no idea of the contents of the keys, just their hash.

I'm going to suggest, as others did, that you restructure your data.

For example, you could create objects that encapsulate the data you need, and arrange them in a binary tree for O(n lg n) searches. You can even go so far as to wrap the entire thing in a class that will give you the nice if foo in Bar: syntax your looking for.

You probably need a couple coordinated structures to accomplish what you want. Here's a simplified example using dicts and sets (tweaking user 6502's suggestion a bit).

# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}

def add_data(coord, data)
    matrix[coord] = data
    try:
        cols[coord[0]].add(coord)
    except KeyError:
        # wrap coords in a list to prevent set() from iterating over it
        cols[coord[0]] = set([coord])
    try:
        rows[coord[1]].add(coord)
    except KeyError:
        rows[coord[1]] = set([coord])

# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']

Now just wrap that in a class or a module, and you'll be off, and as always, if it's not fast enough profile and test before you guess.


Dictionary lookups are very fast. Searching for part of the key (e.g. all tiles in row x) is what's not fast. You could use a dict of dicts. Rather than a single dict indexed by a 2-tuple, use nested dicts like this:

somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"

Then if you want all the tiles in a given "row" it's just somedict[0].

You will need some logic to add the secondary dictionaries where necessary and so on. Hint: check out getitem() and setdefault() on the standard dict type, or possibly the collections.defaultdict type.

This approach gives you quick access to all tiles in a given row. It's still slow-ish if you want all the tiles in a given column (though at least you won't need to look through every single cell, just every row). However, if needed, you could get around that by having two dicts of dicts (one in column, row order and the other in row, column order). Updating then becomes twice as much work, which may not matter for a game where most of the tiles are static, but access is very easy in either direction.

If you only need to store numbers and most of your cells will be 0, check out scipy's sparse matrix classes.


One alternative would be to simply shift the index so it's positive.

E.g. if your indices are contiguous like this:

...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...

Just do something like LookupArray[Index + MinimumIndex], where MinimumIndex is the absolute value of the smallest index you would use.

That way, if your minimum was say, -50, it would map to 0. -20 would map to 30, and so forth.

Edit:

An alternative would be to use a trick with how you use the indices. Define the following key function

Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)

This maps all positive keys to the positive even indices, and all negative elements to the positive odd indices. This may not be practical though, since if you add 100 negative keys, you'd have to expand your array by 200.

One other thing to note: If you plan on doing look ups and the number of keys is constant (or very slowly changing), stick with an array. Otherwise, dictionaries aren't bad at all.


Use multi-dimensional lists -- usually implemented as nested objects. You can easily make this handle negative indices with a little arithmetic. It might use a more memory than a dictionary since something has to be put in every possible slot (usually None for empty ones), but access will be done via simple indexing lookup rather than hashing as it would with a dictionary.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜