How to Increase Cross-Selling and Optimize the Assortment

Cross-selling and upselling are effective tools that help businesses increase the effectiveness of marketing campaigns. Our method for analyzing customer data will allow you improve your cross-selling, optimize the assortment and supplies, and create business templates of purchases.
It's important for a business to know who your buyers are, how old they are, where they live, and what they like. These characteristics help segment the buyers successfully. But, it is equally as important to be aware of what exactly they buy and how. That's why we should analyze shopping carts paying attention to the selection of products from different categories.
The funniest example of this was the analysis of the carts by the company, Teradata, where "beer-diapers" turned out to be the most popular pair. It is very difficult to find such a connection without any special algorithms.
Many platforms have functions for the selection of product pairs using their own solutions. But it is time-consuming to set them up and the model needs at least two weeks to learn.
Our approach will allow you to detect such a pair in several hours and partially automate the manual recommendations.
We use the associative rules search algorithms for selection, i.e. we search for logical interrelations between the connected data. In our case, the rule sounds like this: "If there is a product A in the cart, then there is a possibility that product B will also be there".
The more products there are, the more product pairs there will be. Following the logic, a product pair does not necessarily consist of two products, there may be three, four, five of them, and so on. The amount of pairs is calculated with this formula: 2^n — 1, where n is a number of products. For example, if you have 10 products, then there will be 2^10 — 1 = 1023 product pairs. This is why you need to understand that not all the received rules will be statistically significant.
To get only the real rules you need to use two main criteria — support and confidence.
1. Support is the percentage of the transactions containing a certain product or a bundle of the products. It is used to bring out the most popular products.
2. Confidence is the probability of the fact that if there is a product A, then there will be a product B.
For the algorithm to work, you need to set up the minimal value of the support (min_sup) and confidence (min_conf).
As a rule, the minimum thresholds are defined experimentally. But you should remember about the peculiarities of the thresholds' minimal values:
In the end, the associative rules search task is solved in two stages:
The most popular and productive data mining algorithm for associative rules search is the self-learning Apriori algorithm.
The arules package contains the functions managing the work of Apriori. You can do the product pairs analysis via RStudio, but we will examine the work of the algorithm in the Power BI.
You need to process the data beforehand for the algorithm to work correctly. 80% of the result of the algorithm's work depends on the source data's quality. Moreover, if you don't have enough source data, you're unlikely to get adequate results.
Data submitted to the input should be represented as a transactional table, where every line is a separate transaction and the columns contain the names of the purchased products. In our case, it is a CSV file. We've changed the names of some products in order to maintain the client's privacy. Then upload this data into Power BI.
We manage the Apriori algorithm via R-script opened in Power BI. Let's take a closer look at the script's work.
If you need to input only the first N rules: output<-DATAFRAME(datarules[1:N]).
All the rules are stored in the Value column, just unfold it and save the results:
LHS (left-hand-side) — the main product.
RHS (right-hand-side) — the complementary product.
lift shows the connection level between the variables. If lift<1 then the connection is negative, i.e. the products are the substitutes if lift=1 then there is no connection, and if lift>1 then the connection is positive.
If you've got meaningless rules, change the values of the minimum threshold.
After saving the acquired rules we need to represent them in a convenient way. It's better to use a graphic chart or a circular diagram tree to visualize the associative rules.
The analysis of such a graphic chart allows highlighting the product categories beginning with the buyers' behavior. They also define the roles of the products in these categories.
The main products are the most popular products in each group. For example, in group №1 such products are a car transporter track and a Lego City set. Other products are complementary ones.
The mediator products are the ones that connect two groups of products between each other. The sales between groups happen exactly through such products.