OneBeat’s AI for retail sales prediction

Updated: Jul 24, 2023

OneBeat is a successful startup that leverages an advanced AI solution for planning and executing merchandise flow, from procurement to in-season operations. As part of their ongoing algorithm improvement, the company collaborated with TensorOps to address one of the most challenging issues in inventory management: promotions.

Pricing plays a vital role in sales, as it directly influences consumer behavior through price elasticity. To increase consumption, retailers employ various promotional strategies, including sales, future purchase credits, and incentives. The partnership between OneBeat and TensorOps has empowered their AI system with predictive capabilities to assess the impact of different promotional strategies on sales. By utilizing advanced machine learning techniques, the enhanced solution enables retailers to make well-informed decisions, optimize sales performance, and adjust their optimal inventory level for the upcoming promotions.

How do promotions impact sales?

In this blog post, we will discuss several topics that were addressed by TensorOps and OneBeat during the development of the ML model for assessing the effect of promotions on sales:

Understanding the concept of reference price and perceived discount.
Handling domain-specific cases with custom features.
Data imputation, testing, and quality monitoring.
Considerations for MLOps (Machine Learning Operations).

Let's dive into each of these topics to gain a better understanding of how TensorOps and OneBeat tackled them during the development of the ML model. In the chart below you can see how the model (in red) predicts the sales, when the price (blue line) is expected to drop.

Plot showing a promoption where the baseline price remained the same — OneBeat's model prediction for an example item

Evaluating the “baseline” price

Simplifying our discussion, we will focus on "price reduction" promotions, which are the most common type. At first glance, the idea may appear straightforward: there is a "catalog" price from which the merchant deducts a certain amount to facilitate the promotion. However, the concept of promotions encompasses various nuances that may not be immediately apparent. Some promotions have a long duration, causing customers to become accustomed to the discounted price. Moreover, changes in the catalog price, especially during periods of inflation, can affect the perceived value of a sale and alter the elasticity curve.

To account for these complexities, our solution has implemented a recalculation mechanism for discounts. This mechanism is based on a model that considers multiple factors, including the catalog price, recent sales price statistics, and the effective sale price. Looking at a few examples we can see this baseline price feature in action.

Plot with a sale where the baseline remains the same and then a longer one where it adjusts itself — Evaluating the baseline price. During May the baseline price adjusts itself to the reduced promotion price price

In this scenario, as the promotion is relatively short, the algorithm identifies its relevance and recognizes a significant disparity between the regular selling price and the current discounted price of the product.

However, when a promotion lasts for six months, it is reasonable to expect that the sales increase will not be exceptionally high compared to the average. Over time, consumers tend to become accustomed to the discounted price and incorporate it as the new normal in their purchasing behavior.

Baseline price providing a delayed reaction simulating customer price elasticity perception — Baseline price providing a delayed reaction

In this case, the inclusion of the baseline price proves to be a valuable tool for assessing price elasticity and enabling price adjustments. A particularly useful technique for addressing this in the model is to create a feature that represents the difference or ratio between the sale price and the baseline price. This feature provides the algorithm with a numeric value, either positive or negative, which serves as an indicator of the anticipated sales performance in relation to the average. By utilizing this adaptive model as part of the data preprocessing, our AI solution was able to achieve more accurate results.

Automatically adjusted models per use case

While retailers worldwide share common selling and promotional patterns, regional variations exist, necessitating the inclusion of specific climate or weekly patterns in an effective solution. In one specific case, it was essential to conduct research on consumption patterns, holidays, and yearly trends and incorporate them as features in the model. Through this process, it was discovered that the retail geographic area experiences hot and humid summers, significantly impacting the sales patterns of certain products. The inclusion of holiday and climate data in the datasets resulted in a substantial improvement in prediction accuracy.

During the feature selection step in the pipeline, irrelevant features are filtered out. This bottom-up approach allows us to leverage insights from diverse datasets without training separate models. Over time, these insights become an integral part of the system, enabling the gradual construction of a more sophisticated and comprehensive system through incremental data analysis.

Automatic data quality monitoring

Retailers are at various stages of adopting big data solutions, and the industry lacks standardized protocols and formats for data retrieval. Consequently, interacting with and obtaining data in a format suitable for advanced algorithms is not always straightforward. Therefore, ensuring high-quality data becomes paramount for such projects. Even a brief period of inaccurate sales data or improperly extracted features can lead to overfitting of the models, which in turn results in poor predictions and revenue loss.

To address the risks associated with corrupt data in this application, a data validation component has been integrated into the product's pipeline. This component generates visualizations of the main features for each trained model, enabling engineers to validate the models before they are put into production and used for generating forecasts for retailers. This additional layer of safety ensures the quality of predictions to a greater extent and helps retailers identify issues within their data infrastructure.

Plot showing corrupt data where there is a huge spike in saled preceeded by no reported sales for weeks

In the figure above, a common pattern is shown where retailers have some kind of problem in their data reporting infrastructure and as a result, multiple days of sales are aggregated into a single day. Training a model with this kind of data will create all kinds of unexpected results depending on the period where the data was misreported. Empirical experimentation that we’ve made showed that choosing a model that can handle missing data and removing these outliers completely from the training set has the best results on the model accuracy.

Dealing with various product popularity

Skewed data distributions pose a common challenge in regression models. The specific challenge we faced was that most retailers have thousands of products in their catalog that experience sparse sales. If we were to train our models using the raw data, they would tend to predict very low sales values for all products, as the majority of them have low sales volumes.

To address this issue, various approaches are typically applied, such as optimizing for alternative metrics, artificially balancing datasets, or utilizing class weights during training. However, in our case, the problem presented a slightly greater complexity.

Mean Squared Error and its limitations

Upon switching from absolute error to squared error, we observed notably improved outcomes for high-volume products. This outcome was anticipated because, with an exponential error function, even in the presence of a highly skewed distribution, the penalty for predicting low values for our high-volume products became exceedingly severe. Consequently, the algorithms adjusted their learning patterns accordingly.

However, a challenge arose when we removed the products with zero sales, as the majority of the remaining products still exhibited very low sales volumes. In fact, less than 10% of the products achieved more than 10 units per week. This situation led to complications because our cost function exhibited exponential growth. Thus, the punishment for underestimating sales by just 1 or 2 products proved insufficient. Consequently, most products, except for those with the highest volumes, were consistently predicted as having zero sales.

Feature weights based on sales

To address our persistent problem of underestimating sales, we added feature weights that would almost ignore samples with 0 sales, but that would very quickly grow to the maximum possible weight (around 5-10 sales).

Plot of the weight adjustment curve used — Non linear adjustment of the sample weight

This slightly improved our overall performance and helped mitigate the underestimation problem.

Filling missing values with assumptions

Most of the problems we were having with too many days with 0 sales came from the fact that when data was missing for a certain product for a certain day, we were assuming that meant 0 sales. This assumption was probably correct in most cases, but a new feature was added to fill these values with ones or the last seen value for that product to deliberately force the models into overestimating sales. In many other applications this would result in a reduction in the quality of the predictions, but for most retail applications a slight overestimation of the sales is exactly what is wanted for a perfect restocking schedule.

Building the infrastructure

One problem that many data science projects suffer from is the tracking and transitioning of notebooks to production. Researchers tend to focus too much on their specific research and leave the integration with the client's infrastructure as the last task. This often leads to underdeveloped infrastructure that the clients do not need or want. In this project, we continued to develop on top of the customer’s existing MLOps infrastructure. Research was done in jupyter notebooks. Once the initial baseline model was established we generated Github branches for each experiment and logged the experiments in a self hosted MLFlow. Later on, we migrated the training, EDA and data engineering were integrated in a DVC pipeline that can be reproduced in any environment since it leverages Docker's "build once, run anywhere" philosophy.

In the end, we were left with a robust pipeline that can be easily extended to add or improve functionalities. Any previously run experiment can also be reproduced with minor efforts.

Diagram of the whole pipeline and system — OneBeats open source based architecture for MLOps

Future work

ML work is never done and OneBeat will have to address the following issues:

Adding business constraints: making sure that the predicted sales of individual items adds up to the global sales predicted in different hierarchies (store, category, territory)
Using hierarchical product identifiers to improve predictions
Introduction of external data that can be integrated into the feature extraction steps ex.: weather and holiday data according to the sale location

TensorOps collaborates with businesses to leverage AI, if you wish to work with us on your case don't hesitate to contact us here: https://www.tensorops.ai/contact