Performance Bottleneck with python mapping data structure
I am facing a little performance problem with one of my data structures used for a bigger project in python.
Basically, I am importing a tabular delimited file. Using the normal python open(...) file iterator I am splitting the lines with line.split("\t"). Now I want the actual value of a column be inserted in some sort of dictionary returning an ID for the value. And there it is getting slow:
In general - the dictionary class looks like this:
class Dictionary(list):
def getBitLength(self):
if(len(self) == 0):
return 0
else:
return math.log(len(self), 2)
def insertValue(self, value):
self.append(value)
return l开发者_运维问答en(self) - 1
def getValueForValueId(self, valueId):
return self[valueId]
def getValueIdForValue(self, value):
if(value in self):
return self.index(value)
else:
return self.insertValue(value)
The basic idea was, that the valueId is the index of the value in the dictionary list.
Profiling the program tells me that more than 50% are spend on getValueIdForValue(...).
1566562 function calls in 23.218 seconds
Ordered by: cumulative time
List reduced from 93 to 10 due to restriction <10>
240000 13.341 0.000 16.953 0.000 Dictionary.py:22(getValueIdForValue)
206997 3.196 0.000 3.196 0.000 :0(index)
The problem is, that this is just a small test. In real application environment this function would be called several million times which would increase the runtime for this to a large extend.
Of course I could inherit from python dict, but than the performance problem is quite similar since I need to get key of a given value (in case that the value already has been inserted to the dictionary).
Since I am not the Python Pro until now, can you guys give me any tips how to make this a bit more efficient?
Best & thanks for the help,
n3otec
===
Thanks guys!
Performance of bidict is much better:
240000 2.458 0.000 8.546 0.000 Dictionary.py:34(getValueIdForValue)
230990 1.678 0.000 5.134 0.000 Dictionary.py:27(insertValue)
Best, n3otec
If keys and values are unique, you can use a bidirectional dictionary. There is one python package here
精彩评论