Detection of Bank Customer Churn Using Neural Network and Voting Classifier Ensembles
Main Article Content
Abstract
Customer churn is the loss of business clients to a competitor. Since keeping current clients is more economical than finding new ones, customer retention measures such as churn detection are now essential aspects of modern banking strategy. However, many existing studies rely heavily on conventional machine learning approaches such as Support Vector Machines, Logistic Regression , Random Forest, etc., often neglecting the deeper learning capabilities of neural networks. Also, the repeated use of the same small dataset by the banking studies may limit the improvement of the models’ generalisation. To address these gaps, this study presents a method that integrates deep learning for customer churn detection and a soft and hard voting classifier ensemble embedded with the best performing models over the years for results comparison, supported by a synthetic data augmentation method for model improvement. The study utilised a secondary banking churn dataset from Kaggle, which contained 10,000 unique customer records. To address the dataset limitations, a Conditional Tabular Generative Adversarial Network (CTGAN) model was used to generate an additional 10,000 records, expanding the dataset used for the study to 20,000 rows. Data preprocessing steps were done before training, including oversampling using Synthetic Minority Oversampling Technique (SMOTE). Model development and analysis processes were implemented using Python programming language with prominent libraries and frameworks on Google Colab. In this study, a Feedforward neural network and a soft and hard voting classifier were developed. The voting classifier ensembles integrated three prominent classifiers: Random Forest, XGBoost, and Logistic Regression. The performances were evaluated using Accuracy, F1 Score, and Area Under ROC Curve as metrics. Results show that while the Feedforward Neural Network achieved strong predictive performance with an accuracy of 88.23%, an F1 Score of 87.83% and an AUC of 94.73%, the ensemble approaches performed slightly better as the soft voting classifier delivered the best results, obtaining an accuracy of 89.46%, F1 Score of 88.92% and AUC of 95.40% showing the advantage of combining multiple models to leverage complementary strengths. After comparison with past studies, the proposed models did not surpass the very best outcomes. However, they remain highly competitive, achieving performance levels that are on par with or exceed many earlier works. The contribution of this work is to show how synthetic data augmentation, enhanced preprocessing, deep learning techniques, and machine learning ensembles can improve churn detection in banking studies. Banking institutions can utilise the results from this study to accurately detect churn, supporting proactive customer retention strategies, targeted marketing, and personalised financial services, thereby reducing revenue losses.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
References
Lemmens, A. & Gupta, S. (2020). Managing Churn to Maximize Profits. Marketing Science, 39(5), 956-973. https://doi.org/10.1287/mksc.2020.1229.
Peng, K., Peng, Y., & Li, W. (2023). Research on customer churn prediction and model interpretability analysis. Plos one, 18(12). https://doi.org/10.1371/journal.pone.0289724
Ebrah, K., & Elnasir, S. (2019). Churn prediction using machine learning and recommendations plans for telecoms. Journal of Computer and Communications, 7(11), 33-53. https://doi.org/10.4236/jcc.2019.711003.
Tekouabou, S. C. K., Cherif, W., & Silkan, H. (2019). A data modeling approach for classification problems: application to bank telemarketing prediction. in Proceedings of the 2nd international conference on networking, information systems & security. https://dl.acm.org/doi/abs/10.1145/3320326.3320389.
Imani, M., Joudaki, M., Beikmohamadi, A., & Arabnia, H. R. (2025). Customer Churn Prediction: A Review of Recent Advances, Trends, and Challenges in Conventional Machine Learning and Deep Learning. https://doi.org/10.20944/preprints202503.1969.v1
Geiler, L., Affeldt, S., & Nadif, M. (2022). A survey on machine learning methods for churn prediction. International Journal of Data Science and Analytics, 14(3), 217-242. https://doi.org/10.1007/s41060-022-00312-5.
Obiora, N. C., & Uchenna, N. D. (2022). PREDICTING CUSTOMER CHURN IN THE TELECOMMUNICATION INDUSTRY USING MACHINE LEARNING ALGORITHMS: Performance comparison with logistic regression, random forest, and gradient boosting techniques.
Rahman, M., & Kumar, V. (2020). Machine learning based customer churn prediction in banking, in 4th international conference on electronics, communication and aerospace technology (ICECA). https://doi.org/10.1109/ICECA49313.2020.9297529.
Muneer, A., Ali, R. F., Alghamdi, A., Taib, S. M., Almaghthawi, A., & Ghaleb, E. A. A. (2022). Predicting customers churning in banking industry: A machine learning approach. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 539-549. http://doi.org/10.11591/ijeecs.v26.i1.pp539-549.
Singh, P. P., Anik, F. I., Senapati, R., Sinha, A., & Sakib, N. (2024). Investigating customer churn in banking: A machine learning approach and visualization app for data science and management. Data Science and Management, 7(1), 7-16. https://doi.org/10.1016/j.dsm.2023.09.002.
Lalwani, P., Mishra, M. K., Chadha, J. S., & Sethi, P. (2021). Customer churn prediction system: a machine learning approach. Computing, 104(2), 271-294. https://doi.org/10.1007/s00607-021-00908-y.
Ullah, I., Raza, B., Malik, A. K., Imran, M., Islam, S. U., & Kim, S. W. (2019). A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE access, 7, 60134-60149. https://doi.org/10.1109/ACCESS.2019.2914999.
Çelik, Ö., & Osmanoğlu, U. Ö. (2019). Comparing to techniques used in customer churn analysis. Journal of Multidisciplinary Developments, 4(1), 30-38.
Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A survey on churn analysis in various business domains, IEEE access, 8, 220816-220839. https://doi.org/10.1109/ACCESS.2020.3042657.
Kollipara, R. (n.d.). Bank Customer Data for Customer Churn on Kaggle. [Online]. Available: https://www.kaggle.com/datasets/radheshyamkollipara/bank-customer-churn/data.
Muruganandam, S., Joshi, R., Suresh, P., Balakrishna, N., Kishore, K. H., & Manikanthan, S. V. (2023). A deep learning based feed forward artificial neural network to predict the K-barriers for intrusion detection using a wireless sensor network. Measurement: Sensors, 25. https://doi.org/10.1016/j.measen.2022.100613.
Aruna, S., Divya, M., & Sahu, P. K. (2025). Feature-Based Child Mortality Prediction Using Ensemble and Traditional Machine Learning Models. Journal of Applied Science and Technology Trends, 6(2), 169-182. https://doi.org/10.38094/jastt62264.
Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random Forest Algorithm Overview. Babylonian Journal of Machine Learning, 69-79. https://doi.org/10.58496/BJML/2024/007.
Jurafsky, D., & Martin, J. H. (2025). Speech and Language Processing, 3rd ed draft. Available: https://web.stanford.edu/~jurafsky/slp3.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785-794. https://doi.org/10.1145/2939672.2939785
Reiniger, B. (2020). Is there any way to plot ROC curve for Ensemble hard voting classifier? On Stack Exchange. [Online]. Available: https://datascience.stackexchange.com/questions/77327/is-there-any-way-to-plot-roc-curve-for-ensemble-hard-voting-classifier.