Analytical formulation of synthetic minority oversampling technique (SMOTE) for imbalanced learning

Main Article Content

Firuz Kamalov
Salah Eddine Choutri
Amir F. Atiya

Abstract

Imbalanced data is an issue that affects various applications in machine learning and data science. Synthetic minority oversampling technique (SMOTE) is a common method used to artificially balance the data. Despite the popularity of SMOTE, there is limited information about its analytical properties. In this paper, we develop a precise theoretical formulation of the sampling distribution of SMOTE in several important cases. We also examine the convergence of the SMOTE distribution to the underlying distribution in mean. The results provide a better understanding of SMOTE and other sampling algorithms. In addition, we uncover surprising connections to other fields such as information theory, Euler's constant, and compound distributions. Finally, we show that the SMOTE-generated distribution Z converges to that of the true underlying distribution X in mean.

Article Details

Section

Articles

How to Cite

Analytical formulation of synthetic minority oversampling technique (SMOTE) for imbalanced learning. (2025). Gulf Journal of Mathematics, 19(1), 400-415. https://doi.org/10.56947/gjom.v19i1.2639