开发者

Top k problem - finding usage for my academic work

Top k problem - searching BEST k (3 or 1000) elements in DB

There is fundamental problem with relational DB, that to find top k elems, there is a need to process ALL rows in table. Which make it useless on big data.

I'm making application (for university research, not really my invention, I'm implementing and trying to improve original idea) that allows you to effectively find top k elements by visiting only 3-5% of stored data. Which make it really fast.

There are even user preferences, so on some domain, you can specify function that specify best value for user and aggregation function that specify most significant attributes.


For example DB of cars: attributes:(price, mileage, age of car, ccm, fuel/mile, type of car...) and user values for example 10*price + 5*fuel/mile + 4*mileage + age of car, (s)he doesn't care about type of car and other. - this is aggregation specification

Then for each attribute (price, mileage, ...), there can be totally different "value-function" that sp开发者_如何转开发ecifies best value for user. So for example (price: lower, the better, then value go down, up to $50k, where value is 0 (user don't want car more expensive than 50k). Mileage: other function based on his/hers criteria, ans so on...


You can see that there is quite freedom to specify your preferences and acording to it, best k elements in DB will be found quickly.

I've spent many sleepless night thinking about real-life usability. Who can benefit from that query db? But I failed to whomp up anything and sticking to only academic write-only stance. :-( I hope there can be some real usage for that, but I don't see any....

.... do YOU have any idea how to use that in real-life, real problem, etc...


I'd love to hear from You.


Have a database of people's CVs and establish hiring criteria for different jobs, allowing for a dynamic display of the top k candidates.

Also, considering the fast nature of your solution, you can think of exploiting it in rendering near real-time graphs of highly dynamic data, like stock market quotes or even applications in molecular or DNA-related studies.

New idea: perhaps your research might have applications in clustering, where you would use it to implement a fast k - Nearest Neighbor clustering by complex criteria without having to scan the whole data set each time. This would lead to faster clustering of larger data sets in respect with more complex criteria in picking the K-NN for each data node.


There are unlimited possible real-use scenarios. Getting the top-n values is used all the time.

But I highly doubt that it's possible to get top-n objects without having an index. An index can only be built if the properties that will be searched are known ahead of searching. And if that's the case, a simple index in a relational database is able to provide the same functionality.


It's used in financial organizations all the time, you need to see the most profitable assets / least profitable, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜