How do I store data with a query that's a approximated?

2023-03-04 16:45 问答作者：

I'm trying开发者_StackOverflow社区 to find a way to store my data with fast access (better than O(n)).

My database consists of data (4096 byte strings) that represents some information about some items.

The problem is, that the query is never exact. I get one Item, and then need to find the closest match using a function F(a,b).

just an example:

1234
3456
6466
F(a,b) = return % of similar digits  

GetClosest(1233,F) = 1234

The problem is that F(a,b) is a complicated algorithm, (not a proper metric).

What I have now is just go over the whole database to search for the best match.

Is there a kind of tree or other cluster database type that can give me faster finding complexity ?

More information:

F gives back a similarity value in %percentage. where 100% is a perfect match.

Sorry, the answer is "probably not" unless there is some more structure to your problem that you haven't described. With 4096 byte strings you're suffering from the curse of dimensionality.

If you had shorter strings and enough data that there was a high likelihood of the nearest match being identical over a large chunk of the string, then you could store your data with multiple tree-like structures indexed over different chunks of the string. With high likelihood the nearest would be close enough that you could prove it was nearest based only on close elements in those trees. However with the size of your strings and the limited data that can be stored in a computer, there is no way this is possibly going to work.

That said, do you need the exact closest, or only a somewhat close one? If only a likely close one, then you could index it by several random sparse samples of bits. In your search you can only check elements that match exactly in one of the elements. This will greatly reduce the search space, while rejecting fewer of the close neighbors, and may produce reasonable (even though frequently wrong) answers.

Is there some way you could assign a 'score' to each datum.

You could index/sequence the data by your score.

When you search you assign a score to your search criteria, and look for the item with the closest score.

Depends very much on your data and your definition of "difference" whether this will work.

继续阅读：algorithm approximation database search

How do I store data with a query that's a approximated?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？