How can I read to dictionary keys in a way that make sense?

2023-02-10 22:31 问答作者：

I have about a thousand files that are named in a semi-sensible way like the following:

aaa.ba.ca.01
aaa.ba.ca.02
aaa.ba.ca.03

aaa.ba.da.01
aaa.ba.da.02
aaa.ba.da.03

and so on. Let's say each file contains 2 columns of numbers which I need to read in to the dictionaries: wavelength, flux. The reading in part is easy for me, the hard part is that I need to load these dictionaries so that they store the information like:

wavelength['aaa.ba.ca.01'] (which is the wavelengths of that one file)

wavelength['aaa.ba.ca'] (which is the wavelengths of all subfiles ie ...ca.01, ...ca.02, and ...ca.03 -- in order)

wavelength['aaa.ba'] (which also includes all wavelengths of all "subfiles" as well -- again in order).

and so on. The filenames are well-behaved (the sections are separated by periods, the grouping hierarchy is always the same direction, etc.) but the files can be between 4 sections, or 8 sections long.

My question: is there some sensible way to have pyth开发者_如何学运维on glob the names of the files and by parsing strings or some other magic get the data into these dictionaries? I've hit a brick wall.

A simple, but not efficient, way to do so is to subclass Pythons dictionary, so that when given one non-complete key, it concatenates the contents of all matching keys, in alphabetical order.

There could be more efficient designs: this one major drawback being it will sort and verify all existing dictionary keys on a partial key request. Otherwise, it is so simple to implement that it is worth a try:

class MultiDict(dict):
    def __getitem__(self, key):
        if key in self:
            return dict.__getitem__(self, key)
        result = []
        for complete_key in sorted(self.keys()):
            if complete_key.startswith(key):
                result.extend(self[complete_key])
        return result

#example 
a = MultiDict()
a["a0"] = [1]
a["a1"] = [2]
print  a["a"]
[1, 2]

As for getting teh data in the dictionary, just iterate over all files, with glob or os.listdir, and read the desired contents, as a list, into a MultiDict item using the filename as a key.

What you want does not sound like a dictionary at all. In many ways, I'd say that this is a data structure comparable to a tree. So instead of using a dictionary you're going to want to make a data structure wherein you've got a first node:

                                Root
     'ba'             'ca'               'cd'             'fg'
   /   |   \         /    \             /    \              |
  /    |    \       /      \           /      \             |
'aa' 'di'  '30'    '34'   '45'       'ac'     'ty'        '01'

and then perform a depth first search wherein you've indicated the number of leafs searched by the name: 'ba.aa' would only return things from the 'ba'->'aa' leaf, while 'ba' would return 'ba'->'aa', 'ba'->'di', and 'ba'->'30'.

If you want, I'd make each "level" of nesting into it's own dictionary. That way you could map quickly to the wavelengths and such.

If you only have 1000 files a linear search to look them up is probably fine. On my machine it took 250 us to do one look up. Then you can use itertools.chain to combine data from multiple files.

class DataGlob(object):

def __init__(self):
    self.files = []
    self.wavedata = dict()
    self.fluxdata = dict()

def add(self, filename):
    wlist = []
    flist = []
    for l in open(filename):
        (wlen, flux) = map(float, l.split())
        wlist.append(wlen)
        flist.append(flux)
    self.wavedata[filename] = wlist
    self.fluxdata[filename] = flist

def find_keys(self, prefix):
    return [f for f in self.files if f.startswith(prefix)]

def wavelength(self,fileprefix):
    validkeys = find_keys(prefix)
    return itertools.chain.from_iterable(self.wavedata[k] for k in validkeys)

def flux(self, fileprefix):
    validkeys = self.find_keys(fileprefix)
    return itertools.chain.from_iterable(self.fluxdata[k] for k in validkeys)

继续阅读：glob python

How can I read to dictionary keys in a way that make sense?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？