reducing list based on fuzzy values in python

2023-01-31 03:14 问答作者：

I hav开发者_C百科e a list that contains groups of nearly identical numeric values. i.e. (1004.523, 1004.575, 1004.475, 791.385, 791.298, 791.301, 791.305, 791.299)

What I am trying to do is read through the list and find all the 1004.5 +- values aggregate them and find the average value. then continue on and find all the 791.0 +- values and do the same to them.

I do not know how many individual values there will be in each "group" nor do I know how many groups there will be.

The result I am looking for is another list which would contain the average values of each of the groups. So in the example my result would be (1004.524, 791.3176)

The code I'm currently using is very Kludgey and it seems there should be a much better way to do it.

As you can see I have to repeat code twice once in the else and once at the end of the if since the last set of numbers does not trigger the else. Plus at the completion of the if I need to add the last value.

If I use the len(tones) rather that len(tones)-i I get an out of range error.

Any thoughts or suggestions would be appreciated. Thanks for your help.

    toneLen = len(tones) -1
    for i in range(0, toneLen):
        if abs(tones[i]-tones[i+1]) <= 2.0:
            tmpTones.append(tones[i])
        else:
            freq = mean(tmpTones)
            newTones.append(freq)
            tmpTones = []
    tmpTones.append(tones[i+1])
    freq = mean(tmpTones)
    newTones.append(freq)                
    tones = newTones

UPDATE: First I wanted to thank everyone who submitted suggestions. The response was very quick and helpful. I should have probably included some more info which I am doing below. Thanks so much for your help.

Second , a quick explanation of what I am trying to do. Our local Fire Department is looking for a way to track dispatches for departments close to them. For the most part they use two tone sequential paging i.e. 1000Hz followed by 500Hz.

So I am using numpy fft to find the tone frequency. Since the accuracy of the tone appears to be about +- 2 Hz, I compare the calculated frequency to a list of known paging tones and pick the closest match. After all the tones have been matched to the paging tones I look for matches to departments of interest.

One thing I did not know when I started this that in any given dispatch the same tone can be repeated several times, so the order of the tones is important. An example: 707.3, 339.6, 707.3, 569.1, 447.2, 569.1 would be a typical dispatch. I then look to see if any of the tone pairs are ones I'm interested in if so I display a message

Thanks again for all your help.

Perhaps you are looking for kmeans clustering.

In the code below, I use scipy.cluster.vq.kmeans to cluster the data into k groups. If the distortion is greater than some set threshold amount, then we increase k by one, and redo the kmeans clustering. We repeat until we find groups whose total distortion is less than the threshold amount.

import scipy.cluster.vq as scv
import numpy as np
import collections
def auto_cluster(data,threshold=0.1):
    # There are more sophisticated ways of determining k
    # See http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
    k=1
    distortion=1e20
    while distortion>threshold:
        codebook,distortion=scv.kmeans(data,k)
        k+=1   
    code,dist=scv.vq(data,codebook)    
    groups=collections.defaultdict(list)
    for index,datum in zip(code,data):
        groups[index].append(datum)
    return groups

data=np.array((1004.523, 1004.575, 1004.475, 791.385, 791.298, 791.301, 791.305, 791.299))
groups=auto_cluster(data)    
for index in groups:
    print('{index}: ave({d}) = {ave}'.format(
        index=index,
        d=','.join(map('{0:g}'.format,groups[index])),
        ave=np.mean(groups[index]))
        )

yields

0: ave(791.385,791.298,791.301,791.305,791.299) = 791.3176
1: ave(1004.52,1004.58,1004.48) = 1004.52433333

This finds the borders between groups of nearly identical values and then computes the mean using slices on the original list.

tones = (1004.523, 1004.575, 1004.475, 791.385, 791.298, 791.301, 791.305, 791.299)
splits = [i for i in range(1, len(tones)) if abs(tones[i-1] - tones[i]) > 2]
splits = [0] + splits + [len(tones)]
tones = [mean(tones[splits[i-1]:splits[i]]) for i in range(1, len(splits))]
# [1004.5243333333333, 791.31759999999997]

This does without the intermediate temp list:

assert tones
total = prev = tones[0]
count = 1
newlist = []
for i in xrange(1, len(tones)):
    t = tones[i]
    if abs(t - prev) <= DELTA:
        total += t
        count += 1
        prev = t
    else:
        newlist.append(total / count)
        total = prev = t
        count = 1
newlist.append(total / count)

If you know what numbers may appear in the sequence, you can use this (exacttones is expected values list):

tones = (1004.523, 1004.575, 1004.475, 791.385, 791.298, 791.301, 791.305, 791.299)
exacttones = (1004.5, 791.3)
limit = 0.2
[sum(x)/len(x) for x in [[y for y in tones if abs((y-e))<=limit] for e in exacttones]]
# [1004.5243333333333, 791.31759999999997]

To analyze the sequence without knowing the exacttones, something like this will work:

def calc(d, value):
    for k in d:
        if abs(k-value) <= limit:
            d[k].append(value)
            return d
    d[value] = [value]
    return d
[sum(x)/len(x) for x in reduce(calc, values, {}).values()]
# [1004.5243333333333, 791.31759999999997]

Assuming that this is a list of audio tones, you probably want to use a fraction such as 1.059 to determine the range to assign to a group, rather than hard-coding a number like 2.0.

def average_tones(tones):
    threshold = 1.059
    average = 0
    count = 0
    for tone in sorted(tones):
        if count != 0 and tone >= average*threshold:
            yield average
            count = 0
        average = (average * count + tone) / (count + 1)
        count += 1
    if count != 0:
        yield average

继续阅读：python

reducing list based on fuzzy values in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？