Python In-memory table
What is the right way to forming in-memory table in python with direct lookups for rows and columns.
I thought of using dict of dicts this way,class Table(dict):
def __getitem__(self, key):
if key not in self:
self[key]={}
return dict.__getitem__(self, key)
table = Table()
table['row1']['column1'] = 'value11'
table['row1']['column2'] = 'value12'
table['row2']['column1'] = 'value21'
table['row2']['column2'] = 'value22'
>>>table
{'row1':{'column1':'value11','column2':'value12'},'row2':{'column1':'value21','column2'开发者_运维知识库:'value22'}}
I had difficulty in looking up for values in columns.
>>>'row1' in table
True
>>>'value11' in table['row1'].values()
True
Now how do I do lookup if 'column1'
has 'value11'
Now how do I do lookup if 'column1' has 'value11'
any(arow['column1'] == 'value11' for arow in table.iteritems())
Is this method of forming tables wrong?
No, it's just very "exposed", perhaps too much -- it could usefully be encapsulated in a class which exposes the methods you need, then the issue of how best to implement them does not affect all the rest of your application.
Is there a better way to implement such tables with easier lookups?
Once you have designed a class whose interface you'd like to use, you can experiment with very different implementation approaches and benchmark them on a workload that's representative of your usage pattern, so you can find out what's best for you (assuming table manipulation and lookup are a big part of your application's runtime, of course -- to find out, profile your app).
I had similar but not identical needs in a large internal app I maintain at work, except that the row indices are integer (only the column names are strings), the column order is important, and the workload is more about "editing" the table (adding, removing, reordering rows or columns, renaming columns, etc). I started with a table exposing the functionality I needed, with the simplest rough-and-ready implementation internally (a list of dicts, plus a list of column names for the column ordering); and by now I have evolved it (independently of the actual "application-level" parts, but based on profiling and benchmarking thereof) to completely different implementations (currently based on numpy
).
I think you should proceed along similar lines: "clothe" your current implementation into a nice "interface" with all the methods you need, profile your app -- unless this table object is a performance bottleneck, you're done; if it is a bottleneck, you can optimize the implementation (experiment, measure, repeat;-) without disturbing any of the rest of your application.
Inheriting from dict
is not a good idea because you probably don't want to expose all of dict
's rich functionality; plus, what you're doing is, roughly, an inefficient implementation of collections.defaultdict(dict)
. So, encapsulate the latter:
import collections
class Table(object):
def __init__(self):
self.d = collections.defaultdict(dict)
def add(self, row, col, val):
self.d[row][col] = val
def get(self, row, col, default=None):
return self.d[row].get(col, default)
def inrow(self, row, col):
return col in self.d[row]
def incol(self, col, val):
return any(x[col]==val for x in self.d.iteritems())
etc, etc -- write all the methods your app needs, with useful, short names, then maybe see if you can alias some of them as special methods if they're often used that way, e.g maybe (assuming Python 2.* -- requires slightly different syntax in 3.*):
def __setitem__(self, (row, col), val):
self.add(row, col, val)
and so forth. Once you have the code working, then comes the right time for profiling, benchmarking, and -- just perhaps -- internal optimization of the implementation.
I'd use an in-memory database with SQLite for this. The sqlite module is even in the standard library since Python 2.5, which means this doesn't even add much to your requirements.
A nested list should be able to do the job here. I would only use nested dictionaries if elements are spread thin across the grid.
grid = []
for row in height:
grid.append([])
for cell in width:
grid[-1].append(value)
Checking rows is easy:
def valueInRow(value, row):
return value in grid[row]
Checking collumns takes a little more work, because the grid is a list of rows, not a list of collumns:
def collumnIterator(collumn):
height = len(grid)
for row in xrange(height):
yield grid[row][collumn]
def valueInCollumn(value, collumn):
return value in collumnIterator(collumn)
Now how do I do lookup if 'column1' has 'value11'
Are you asking about this?
found= False
for r in table:
if table[r]['column1'] == 'value11'
found= True
break
Is this what you're trying to do?
精彩评论