开发者

appropriate minimum support for itemset?

Please suggest me for any kind material 开发者_开发知识库about appropriate minimum support and confidence for itemset!

::i use apriori algorithm to search frequent itemset. i still don't know appropriate support and confidence for itemset. i wish to know what kinds of considerations to decide how big is the support.


The answer is that the appropriate values depends on the data.

For some datasets, the best value may be 0.5. But for some other datasets it may be 0.05. It depends on the data.

But if you set minsup =0 and minconf = 0, some algorithms will run out of memory before terminating, or you may run out of disk space because there is too many patterns.

From my experience, the best way to choose minsup and minconf is to start with a high value and then to lower them down gradually until you find enough patterns.

Alternatively, if you don't want to have to set minsup, you can use a top-k algorithms where instead of specifying minsup, you specify for example that you want the k most frequent rules. For example, k = 1000 rules.

If you are interested by top-k association rule mining, you can check my Java code here:

http://www.philippe-fournier-viger.com/spmf/

The algorithm is called TopKRules and the article describing it will be published next month.

Besides that, you need to know that there is many other interestingness measures beside the support and confidence: lift, all-confidence, ... To know more about this, you can read this article: "On selecting interestingness measures for association rules" and "A Survey of Interestingness Measures for Association Rules" Basically, all measures have some problems in some cases... no measure is perfect.

Hope this helps!


In any association rule mining algorithm, including Apriori, it is up to the user to decide what support and confidence values they want to provide. Depending on your dataset and your objectives you decide the minSup and minConf. Obviously, if you set these values lower, then your algorithm will take longer to execute and you will get a lot of results.


The minimum support and minimum confidence parameters are a user preference. If you want a larger quantity of results (with lower statistical confidence), choose the parameters appropriately. In theory you can set them to 0. The algorithm will run, but it will take a long time, and the result will not be particularly useful, as it contains just about anything.

So choose them so that the result suit your needs. Mathematically, any value is "correct".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜