Measuring the interest of a rule in Association Rules
The importance of knowing the correlation between events in marketing strategies
Reading Time: 4 minutes
Post published on 10/12/2020 by Donata Petrelli and released with licenza CC BY-NC-ND 3.0 IT (Creative Common – Attribuzione – Non commerciale – Non opere derivate 3.0 Italia)
On a shoe e-commerce site (yes, my favourite) how important is it to suggest a high heel rather than a low one?
The answer to this question is called “lift” and it is a measure that is part of the Association Rules algorithm.
Thanks to your very grateful feedback, received after the previous article on Association Rules, I have seen that there is a lot of interest around this topic and many of you have asked me to continue talking about it.
I have therefore decided to complete the introduction to Association Rules by addressing the last important topic, the lift. I hope it will be of help to all those who have to manage the sales marketing of countless products… not just shoes. 🙂
Summary of the Previous Episode
The Association Rules algorithm finds hidden relationships between elements of a set through two measures:
1) Support
2) Confidence
Based on these metrics, they find those rules that exceed a minimum threshold of support and confidence.
However, it often happens that, in this way, one finds rules that are uninteresting or even useless, for the purpose of finding associations for commercial or marketing purposes.
Let us see why.
Criticalities
The rule support/confidence method
does not take into account an important aspect: the absolute probability of the event Y.
Following only the logic of the support/confidence of a rule, one can also find useless rules. In fact, finding rules with acceptable support/confidence for the thresholds we have set ourselves but, in any case, lower than the value at the probability with which Y occurs makes little sense.
We therefore need a measure that indicates the “correlation between events”, i.e. how the occurrence of one event raises the occurrences of the other.
In the case of shopping on an ecommerce site, this means finding a measure that increases our confidence that we will find Y in our basket, knowing that X is there.
Solution
Given two events X, Y we define the correlation coefficient between the two as
If corrX,Y = 1, the two events are independent
If corrX,Y > 1 the two events are positively correlated
If corrX,Y < 1 the two events are negatively correlated
If X → Y is an associative rule, the value corrX,Y is called a lift.
Specifically:
and since :
we obtain that:
Thanks to this new metric, it turns out that a strong rule is not always interesting.
Furthermore, it is important to note that:
and this confirms the usefulness of the found rule.
Let us see it in practice.
Example
Let us take the following transactions as an example:
ID Transaction | Items |
---|---|
1 | X, Y |
2 | X, Y, Z |
3 | X, Z |
4 | X, Z |
5 | Z |
6 | Z |
7 | Z |
8 | Z |
Where:
- X : Heels
- Y : Handbag
- Z : flats
And we calculate the support, confidence and lift values for the three rules as follows:
Rule | Support | Confidence | Lift |
---|---|---|---|
X -> Y | 25,0 % | 50,0 % | 2 |
X -> Z | 37,5 % | 75,0 % | 0,9 |
Y -> Z | 12,5 % | 50,0 % | 0,57 |
For an associative rule to be useful, it must reach a lift value of at least 1.
Therefore, we can conclude that: although the X->Z rule is stronger, the X->Y rule is more interesting.
Conclusion
Thanks to the lift measure, once rules have been found using the Association Rules algorithm, only interesting ones can be taken into account.
In the example shown, the choice is to promote high heels on the website in order to push the sale of the bag at the same time and thus optimise the budget.
Personally, I would have promoted high heels anyway … but this is pure passion and not associative calculation 😀
it is instead important to consider this last aspect, related to the associative rules, in order to optimise our marketing campaigns and our business.