Most appropriate data structure (Python)

2023-01-24 13:55 问答作者：

I'm new to Python and have what is probably a very basic question about the 'be开发者_Go百科st' way to store data in my code. Any advice much appreciated!

I have a long .csv file in the following format:

Scenario,Year,Month,Value
1,1961,1,0.5
1,1961,2,0.7
1,1961,3,0.2
etc.

My scenario values run from 1 to 100, year goes from 1961 to 1990 and month goes from 1 to 12. My file therefore has 100*29*12 = 34800 rows, each with an associated value.

I'd like to read this file into some kind of Python data structure so that I can access a 'Value' by specifying the 'Scenario', 'Year' and 'Month'. What's the best way to do this please (or what are the various options)?

In my head I think of this data as a kind of 'number cuboid' with axes for Scenario, Year and Month, so that each Value is located at co-ordinates (Scenario, Year, Month). For this reason, I'm tempted to try to read these values into a 3D numpy array and use Scenario, Year and Month as indices. Is this a sensible thing to do?

I guess I could also make a dictionary where the keys are something like

str(Scenario)+str(Year)+str(Month)

Would this be better? Are there other options?

(By 'better' I suppose I mean 'faster to access', although if one method is much less memory intensive than another it'd be good to know about that too).

Thanks very much!

I'd use a dict of tuples. Simple, fast, and a hash-table look-up to retrieve a single value:

import csv

reader = csv.reader(open('data.csv', 'rb'))
header = reader.next()
data = {}

for row in reader:
    key = tuple([int(v) for v in row[:-1]])
    val = row[-1]
    data[key] = float(val)

# Retrieve a value
print data[1, 1961, 3]

I would use sqlite3 for storing the data to disk. You'll be able to read in the full data set or subsets through SQL queries. You can then load that data into a numpy array or other Python data structure -- whatever is most convenient for the task.

If you do choose to use sqlite, also note that sqlite has a TIMESTAMP data type. It may be a good idea to combine the year and month into one TIMESTAMP. When you read TIMESTAMPs into Python, sqlite3 can be told to automatically convert the TIMESTAMPs into datetime.datetime objects, which would reduce some of the boilerplate code you'd otherwise have to write. It will also make it easier to form SQL queries which ask for all the rows between two dates.

sqlite is a nice option if you're going to access your values by different parameters each time.

If that's not the case, and you'll always access by this triplet (scenario, year, month), you can use a Tuple (immutable list) as your key, and the value as your value.

In code it would look like:

d = {}
d[1, 1961, 12] = 0.5

or in more generic loop code:

d[scenario, year, month] = value

later on you can just access it with:

print d[scenario, year, month]

Python will automatically create the Tuple for you.

Make a dictionary of dictionaries of dictionaries like you described. If you need data as numbers, convert them to numbers once when your read them and stores numbers in the dicts. It will be faster then using strings as keys. Let me know if need a help with the code.

继续阅读：arrays data-structures python

Most appropriate data structure (Python)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？