Cross-selling and upselling are effective tools that help businesses increase the effectiveness of marketing campaigns. Our method for analyzing customer data will allow you improve your cross-selling, optimize the assortment and supplies, and create business templates of purchases.
It's important for a business to know who your buyers are, how old they are, where they live, and what they like. These characteristics help segment the buyers successfully. But, it is equally as important to be aware of what exactly they buy and how. That's why we should analyze shopping carts paying attention to the selection of products from different categories.
The funniest example of this was the analysis of the carts by the company, Teradata, where "beer-diapers" turned out to be the most popular pair. It is very difficult to find such a connection without any special algorithms.
Many platforms have functions for the selection of product pairs using their own solutions. But it is time-consuming to set them up and the model needs at least two weeks to learn.
Our approach will allow you to detect such a pair in several hours and partially automate the manual recommendations.
Product Pairs Selection Algorithm for Cross-selling
We use the associative rules search algorithms for selection, i.e. we search for logical interrelations between the connected data. In our case, the rule sounds like this: "If there is a product A in the cart, then there is a possibility that product B will also be there".
The more products there are, the more product pairs there will be. Following the logic, a product pair does not necessarily consist of two products, there may be three, four, five of them, and so on. The amount of pairs is calculated with this formula: 2^n — 1, where n is a number of products. For example, if you have 10 products, then there will be 2^10 — 1 = 1023 product pairs. This is why you need to understand that not all the received rules will be statistically significant.
To get only the real rules you need to use two main criteria — support and confidence.
1. Support is the percentage of the transactions containing a certain product or a bundle of the products. It is used to bring out the most popular products.
2. Confidence is the probability of the fact that if there is a product A, then there will be a product B.
For the algorithm to work, you need to set up the minimal value of the support (min_sup) and confidence (min_conf).
As a rule, the minimum thresholds are defined experimentally. But you should remember about the peculiarities of the thresholds' minimal values:
- If the support value is too great, the algorithm will define meaningless rules at the output. That's why it isn't recommended to set the support to more than 20%.
- If the support value is small, several products may appear within the rules, which are actually random. In this case, you need to watch the lift value.
- If the confidence value is too small, as a rule, this pair cannot be taken into account. The confidence value <7% seems more random.
- The confidence value of more than 85% says that customers practically always buy this pair.
In the end, the associative rules search task is solved in two stages:
- Search for the most popular elements (support>min_sup);
- Derive the rules from the elements, formed on the first step (confidence>min_conf).
The most popular and productive data mining algorithm for associative rules search is the self-learning Apriori algorithm.
The arules package contains the functions managing the work of Apriori. You can do the product pairs analysis via RStudio, but we will examine the work of the algorithm in the Power BI.
Get the Data Ready for the Algorithm Launch
You need to process the data beforehand for the algorithm to work correctly. 80% of the result of the algorithm's work depends on the source data's quality. Moreover, if you don't have enough source data, you're unlikely to get adequate results.
Data submitted to the input should be represented as a transactional table, where every line is a separate transaction and the columns contain the names of the purchased products. In our case, it is a CSV file. We've changed the names of some products in order to maintain the client's privacy. Then upload this data into Power BI.
Launch the Algorithm
We manage the Apriori algorithm via R-script opened in Power BI. Let's take a closer look at the script's work.
- We link up the libraries needed for the script's execution: library(Matrix)library(arules)library(readr). Make sure that you have installed all the necessary packages before launching the script in Power BI beforehand. For example, open install.packages("arules") in R-studio to install the "arules" package.
- In its current form, the data uploaded into the Power Query is not well-suited for the Apriori algorithm work in R. That's why the data transforms into transaction objects with the help of this function: data<-read.transactions(«file address», header=TRUE, sep=«,», encoding=«UTF-8»).
- We search the rules that determine the product pairs with the help of the apriori function. Minimal support and confidence values and also the minimal length are used as the parameters: datarules<-apriori(data, parameter = list(support=0.003, confidence=0.4, minlen=2)).
- We write the rules acquired from the previous step into the data format understandable by Power Query: output<-DATAFRAME(datarules).
If you need to input only the first N rules: output<-DATAFRAME(datarules[1:N]).
All the rules are stored in the Value column, just unfold it and save the results:
LHS (left-hand-side) — the main product.
RHS (right-hand-side) — the complementary product.
lift shows the connection level between the variables. If lift<1 then the connection is negative, i.e. the products are the substitutes if lift=1 then there is no connection, and if lift>1 then the connection is positive.
If you've got meaningless rules, change the values of the minimum threshold.
Represent the Results
After saving the acquired rules we need to represent them in a convenient way. It's better to use a graphic chart or a circular diagram tree to visualize the associative rules.
The analysis of such a graphic chart allows highlighting the product categories beginning with the buyers' behavior. They also define the roles of the products in these categories.
The main products are the most popular products in each group. For example, in group №1 such products are a car transporter track and a Lego City set. Other products are complementary ones.
The mediator products are the ones that connect two groups of products between each other. The sales between groups happen exactly through such products.