We need to set up the R package arules and rpy2 to connect to R. Create a new conda environment.
To install arules, open R and install the package arules using install.packages("arules").
To install rpy2 and pandas use:
conda install -c conda-forge rpy2
conda install -c conda-forge pandas
The data need to be prepared as a Pandas dataframe. Here we have 9 transactions with three items called A, B and C. True means that a transaction contains the item.
import pandas as pd
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True],
[True, False, True],
[True, True, True],
[False, False, True],
[False, True, True],
[True, False, True],
],
columns=list ('ABC'))
df
| A | B | C | |
|---|---|---|---|
| 0 | True | True | True |
| 1 | True | False | False |
| 2 | True | True | True |
| 3 | True | False | False |
| 4 | True | True | True |
| 5 | True | False | True |
| 6 | True | True | True |
| 7 | False | False | True |
| 8 | False | True | True |
| 9 | True | False | True |
from rpy2.robjects import pandas2ri
pandas2ri.activate()
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
arules = importr("arules")
# some helper functions
def arules_as_matrix(x, what = "items"):
return ro.r('function(x) as(' + what + '(x), "matrix")')(x)
def arules_as_dict(x, what = "items"):
l = ro.r('function(x) as(' + what + '(x), "list")')(x)
l.names = [*range(0, len(l))]
return dict(zip(l.names, map(list,list(l))))
def arules_quality(x):
return x.slots["quality"]
itsets = arules.apriori(df,
parameter = ro.ListVector({"supp": 0.1, "target": "frequent itemsets"}))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
NA 0.1 1 none FALSE TRUE 5 0.1 1
maxlen target ext
10 frequent itemsets TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 1
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[3 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [7 set(s)] done [0.00s].
creating S4 object ... done [0.00s].
print(arules.DATAFRAME(itsets))
items support transIdenticalToItemsets count
1 {B} 0.5 0.0 5
2 {A} 0.8 0.2 8
3 {C} 0.8 0.1 8
4 {A,B} 0.4 0.0 4
5 {B,C} 0.5 0.1 5
6 {A,C} 0.6 0.2 6
7 {A,B,C} 0.4 0.4 4
The frequent itemsets can be accessed as a binary matrix.
its = arules_as_matrix(itsets)
print(its)
[[0 1 0] [1 0 0] [0 0 1] [1 1 0] [0 1 1] [1 0 1] [1 1 1]]
Access itemset as a dictionary
its = arules_as_dict(itsets)
print(its)
{'0': ['B'], '1': ['A'], '2': ['C'], '3': ['A', 'B'], '4': ['B', 'C'], '5': ['A', 'C'], '6': ['A', 'B', 'C']}
Accessing the quality measures
arules_quality(itsets)
| support | transIdenticalToItemsets | count | |
|---|---|---|---|
| 1 | 0.5 | 0.0 | 5 |
| 2 | 0.8 | 0.2 | 8 |
| 3 | 0.8 | 0.1 | 8 |
| 4 | 0.4 | 0.0 | 4 |
| 5 | 0.5 | 0.1 | 5 |
| 6 | 0.6 | 0.2 | 6 |
| 7 | 0.4 | 0.4 | 4 |
rules = arules.apriori(df,
parameter = ro.ListVector({"supp": 0.1, "conf": 0.8}))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.8 0.1 1 none FALSE TRUE 5 0.1 1
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 1
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[3 item(s), 10 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [6 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
print(arules.DATAFRAME(rules))
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1.0 1.00 8
2 {} {C} 0.8 0.8 1.0 1.00 8
3 {B} {A} 0.4 0.8 0.5 1.00 4
4 {B} {C} 0.5 1.0 0.5 1.25 5
5 {A,B} {C} 0.4 1.0 0.4 1.25 4
6 {B,C} {A} 0.4 0.8 0.5 1.00 4
Get the left-hand-side, the right-hand-side and the rule quality.
lhs = arules_as_matrix(rules, what = "lhs")
print (lhs)
[[0 0 0] [0 0 0] [0 1 0] [0 1 0] [1 1 0] [0 1 1]]
rhs = arules_as_matrix(rules, what = "rhs")
print(rhs)
[[1 0 0] [0 0 1] [1 0 0] [0 0 1] [0 0 1] [1 0 0]]
lhs = arules_as_dict(rules, what = "lhs")
print (lhs)
{'0': [], '1': [], '2': ['B'], '3': ['B'], '4': ['A', 'B'], '5': ['B', 'C']}
rhs = arules_as_dict(rules, what = "rhs")
print (rhs)
{'0': ['A'], '1': ['C'], '2': ['A'], '3': ['C'], '4': ['C'], '5': ['A']}
arules_quality(rules)
| support | confidence | coverage | lift | count | |
|---|---|---|---|---|---|
| 1 | 0.8 | 0.8 | 1.0 | 1.00 | 8 |
| 2 | 0.8 | 0.8 | 1.0 | 1.00 | 8 |
| 3 | 0.4 | 0.8 | 0.5 | 1.00 | 4 |
| 4 | 0.5 | 1.0 | 0.5 | 1.25 | 5 |
| 5 | 0.4 | 1.0 | 0.4 | 1.25 | 4 |
| 6 | 0.4 | 0.8 | 0.5 | 1.00 | 4 |