Using frequent itemset mining to build association rules?

2023-03-27 15:13 问答作者：

I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this:

Dataset 1:

A B C 0 E
A 0 C 0 0
A 0 C D E
A 0 C 0 E

The way I interpret this is at some point in time, (A,B,C,E) occurred together and so did (A,C), (A,C,D,E) etc.

Dataset 2:

5A 1B 5C  0 2E
4A  0 5C  0  0
2A  0 1C 4D 4E
3A  0 4C  0 3E

The way I interpret this is at some point in time, 5 occurrences of A, 1 occurrence of B, 5 occurrences of C and 2 occurrences of E happened and so on.

I am trying to find what item开发者_如何学编程s occur together and if possible, also find out the cause and effect for this. For this, I am not understanding how to go about using both the datasets (or if one is enough). It would be good to have a good tutorial on this but my primary question is which dataset to utilize and how to proceed in (i) building a frequent itemset and (ii) building association rules between them.

Can someone point me to a practical tutorials/examples (preferably in Python) or at least explain in brief words on how to approach this problem?

Some theoretical facts about association rules:

Association rules is a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Whether the patterns make sense is left to human interpretation.
The goal of association rules is to detect relationships or association between specific values of categorical variables in large sets.
And is rules can intrepreted as "70% of the the customers who buy wine and cheese also buy grapes".

To find association rules, you can use apriori algorithm. There already exists many python implementation, although most of them are not efficient for practical usage:

source1: http://code.google.com/p/autoflash/source/browse/trunk/python/apriori.py?r=31
source2: http://www.nullege.com/codes/show/src%40l%40i%40libbyr-HEAD%40test_freq_item_algos.py/5/apriori/python

or use Orange data mining library, which has a good library for association rules.

Usage example:

'''
save first example as item.basket with format
A, B, C, E
A, C
A, C, D, E
A, C, E
open ipython same directory as saved file or use os module
>>> import os
>>> os.chdir("c:/orange")
'''
import orange

items = orange.ExampleTable("item")
#play with support argument to filter out rules
rules = orange.AssociationRulesSparseInducer(items, support = 0.1) 
for r in rules:
    print "%5.3f %5.3f %s" % (r.support, r.confidence, r)

To learn more about association rules/frequent item mining, then my selection of books are:

"Introduction to Data mining" - Vipin Kumar, best book for beginner
"Data mining and knowledge discovery handbook", for advanced user
"Mining massive data" - tips how to use in reallife and how build efficient solutions, free book, http://i.stanford.edu/~ullman/mmds.html
Ofcourse there are many fantastic scientific papers to read: by example do some search on MS Acedemic about Frequent Pattern mining

There is no short way.

It seems like a neat way to handle this type of problems is using a Bayesian network. In particular as a Bayesian network structure learning problem. Once you have that you will be able to efficiently answer questions like p(A=1|B=0 and C=1) and so on.

If you have quantities for each items, then you could consider "high utility itemset mining". It is the problem of itemset mining but adapted for the case where items can have quantities in each transaction and also each item can have a weight.

If you just use the basic Apriori, then you would loose the information about quantities.

继续阅读：data-mining machine-learning python

Using frequent itemset mining to build association rules?

Dataset 1:

Dataset 2:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Dataset 1:

Dataset 2:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？