Performance Improvement of Quality Monitoring Systems in Imbalanced Data Conditions for Fat-Filled Powder Quality in The Dairy Industry

Muhammad Asrol; Oki Pratama

doi:10.37385/jaets.v7i1.6996

Authors

Muhammad Asrol Industrial Engineering Department, BINUS Graduate Program - Master of Industrial Engineering, Bina Nusantara University, Jakarta, Indonesia, 11480
Oki Pratama Bina Nusantara University

DOI:

https://doi.org/10.37385/jaets.v7i1.6996

Keywords:

Quality monitoring, Dairy industry, Imbalanced data, Machine learning, Synthetic data manipulation

Abstract

Fat-filled powder has the potential to substitute milk in meeting the nutritional needs of the community, but its product quality remains unstable during continuous production processes. A key challenge in fat-filled powder (FFP) production is the difficulty in quality monitoring, which is influenced by various uncertainty factors that affect product quality. Machine learning can be implemented for quality monitoring system, but the imbalanced data conditions require the development of algorithms with optimal performance. This study aims to design a quality monitoring system for FFP using a machine learning model under imbalanced dataset conditions and the influence of other uncertainty factors. A Random Forest (RF) machine learning model was developed for monitoring FFP quality. In the context of imbalanced datasets, the model was optimized through various scenarios, including data splitting for training and testing, as well as the Synthetic Minority Oversampling Technique (SMOTE) and Distribution Optimally Balanced – Stratified Cross Validation (DOB-SCV) schemes. The results showed that the SMOTE model achieved the best performance in terms of accuracy, precision, and recall with scores of 99.67%, 99.79%, and 99.24%, respectively, on the testing data. Statistically, the RF model with the SMOTE data manipulation scenario also showed significant differences compared to the DOB-SCV model and the traditional data splitting approach. The quality monitoring model for FFP developed in this study can be implemented in the dairy industry, offering more stable, accurate quality monitoring predictions that align with real conditions, helping to avoid quality uncertainties during the production process. The implementation of this model in the industry has the potential to facilitate a broader, more transparent, and optimized product quality evaluation process, which can also be conducted in real time under continuous production conditions.

Downloads

Download data is not yet available.

References

Anggoro, D. A., & Mukti, S. S. (2021). Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure. International Journal of Intelligent Engineering and Systems, 14(6), 198–207. https://doi.org/10.22266/ijies2021.1231.19

Bahel, V., Pillai, S., & Malhotra, M. (2020). A Comparative Study on Various Binary Classification Algorithms and their Improved Variant for Optimal Performance. 2020 IEEE Region 10 Symposium, TENSYMP 2020, January 2020, 495–498. https://doi.org/10.1109/TENSYMP50017.2020.9230877

Bhagat, R. C., & Patil, S. S. (2015). Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest. Souvenir of the 2015 IEEE International Advance Computing Conference, IACC 2015, 403–408. https://doi.org/10.1109/IADCC.2015.7154739

Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1109/ICCECE51280.2021.9342376

Burnaev, E., Erofeev, P., & Papanov, A. (2015). Influence of resampling on accuracy of imbalanced classification. Eighth International Conference on Machine Vision (ICMV 2015), 9875(Icmv), 987521. https://doi.org/10.1117/12.2228523

Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(2002), 321–357. https://doi.org/10.46880/jmika.vol4no1.pp67-72

Cockburn, M. (2020). Review: Application and prospective discussion of machine learning for the management of dairy farms. Animals, 10(9), 1–22. https://doi.org/10.3390/ani10091690

Deniz, A., Kiziloz, H. E., Dokeroglu, T., & Cosar, A. (2017). Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques. Neurocomputing, 241, 128–146. https://doi.org/10.1016/j.neucom.2017.02.033

Elreedy, D., Atiya, A. F., & Kamalov, F. (2024). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4

Feng, Z., Liu, D., Gu, J., & Zheng, L. (2024). Raman spectroscopy and fusion machine learning algorithm: A novel approach to identify dairy fraud. Journal of Food Composition and Analysis, 129, 106090. https://doi.org/10.1016/J.JFCA.2024.106090

Finnegan, E. W., Mahomud, M. S., Murphy, E. G., & O’Mahony, J. A. (2021). The influence of pre-heat treatment of skim milk on key quality attributes of fat filled milk powder made therefrom. International Journal of Dairy Technology, 74(2), 404–413. https://doi.org/10.1111/1471-0307.12758

Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42–47. http://www.ijetae.com/files/Volume2Issue4/IJETAE_0412_07.pdf

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/J.ESWA.2016.12.035

Han, J., Li, T., He, Y., & Gao, Q. (2022). Using Machine Learning Approaches for Food Quality Detection. Mathematical Problems in Engineering, 2022(1), 6852022. https://doi.org/10.1155/2022/6852022

Kaitlin, Trace, Smith, & Sadler, B. (2020). Identifying new X-ray binary candidates in M31 using random forest classification. Monthly Notices of the Royal Astronomical Society, 492(4), 5075–5088. https://doi.org/10.1093/MNRAS/STAA207

Kene Ejeahalaka, K., & On, S. L. W. (2020). Effective detection and quantification of chemical adulterants in model fat-filled milk powders using NIRS and hierarchical modelling strategies. Food Chemistry, 309, 125785. https://doi.org/10.1016/J.FOODCHEM.2019.125785

Kokkinos, K., Papageorgiou, E., Dafopoulos, V., & Ioannis, A. (2017). Efficiency in Energy Decision Support Systems Using Soft Computing Techniques. In A. Kumar, S. Ajith, A. Patrick, & S. Michael (Eds.), Intelligent Decision Support Systems for Sustainable Computing (pp. 33–52). Springer Nature.

Kumar, A., & Agrawal, S. (2024). Enhancing quality-based classification of perishable products: a convolutional neural network approach with statistical hyperparameter optimization. Multimedia Tools and Applications, 1–24. https://doi.org/10.1007/S11042-024-19700-Z/METRICS

Liu, F., & Dai, Y. (2022). Product Processing Quality Classification Model for Small-Sample and Imbalanced Data Environment. Computational Intelligence and Neuroscience, 2022, 1–16. https://doi.org/10.1155/2022/9024165

Lopez, V., Fernandez, A., & Herrera, F. (2014). On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Information Sciences, 257, 1-13. https://doi.org/10.1016/j.ins.2013.09.038

Mahmudah, K. R., Indriani, F., Takemori-Sakai, Y., Iwata, Y., Wada, T., & Satou, K. (2021). Classification of imbalanced data represented as binary features. Applied Sciences, 11(17), 7825. https://doi.org/10.3390/app11177825

Mendez, K. M., Reinke, S. N., & Broadhurst, D. I. (2019). A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics, 15(12), 1-15. https://doi.org/10.1007/s11306-019-1612-4

Moreno-Torres, J. G., Saez, J. A., & Herrera, F. (2012). Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1304-1312. https://doi.org/10.1109/TNNLS.2012.2199516

Mu, F., Gu, Y., Zhang, J., & Zhang, L. (2020). Milk source identification and milk quality estimation using an electronic nose and machine learning techniques. Sensors, 20(15), 1-14. https://doi.org/10.3390/s20154238

Mujahid, M., Kina, E., Rustam, F., Gracia Villar, M., Alvarado, E. S., De I., Diez, L. T., & Ashraf, I. (2024). Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering. Journal of Big Data, 11, 87. https://doi.org/10.1186/s40537-024-00943-4

Murphy, S. I., Reichler, S. J., Martin, N. H., Boor, K. J., & Wiedmann, M. (2021). Machine learning and advanced statistical modeling can identify key quality management practices that affect postpasteurization contamination of fluid milk. Journal of Food Protection, 84(9), 1496-1511. https://doi.org/10.4315/JFP-20-431

Naik, N., & Purohit, S. (2017). Comparative Study of Binary Classification Methods to Analyze a Massive Dataset on Virtual Machine. Procedia Computer Science, 112(2017), 1863–1870. https://doi.org/10.1016/j.procs.2017.08.232

Pounds, K., Bao, H., Luo, Y., De, J., Schneider, K., Correll, M., & Tong, Z. (2022). Real-Time and Rapid Food Quality Monitoring Using Smart Sensory Films with Image Analysis and Machine Learning. ACS Food Science and Technology, 2(7), 1123–1134. https://doi.org/10.1021/acsfoodscitech.2c00124

Purwantiningsih, T. I., Bria, M. A. B., & Kia, K. W. (2022). Levels Protein and Fat of Yoghurt Made of Different Types and Number of Cultures. Journal of Tropical Animal Science and Technology, 4(1), 66–73. https://doi.org/10.32938/jtast.v4i1.967

Qin, W., Zhuang, Z., Guo, L., & Sun, Y. (2022). A hybrid multi-class imbalanced learning method for predicting the quality level of diesel engines. Journal of Manufacturing Systems, 62(April), 846–856. https://doi.org/10.1016/j.jmsy.2021.03.014

Shekar, B. H., & Dagnew, G. (2019). Grid search-based hyperparameter tuning and classification of microarray cancer data. 2019 2nd International Conference on Advanced Computational and Communication Paradigms, ICACCP 2019, November, 1–8. https://doi.org/10.1109/ICACCP.2019.8882943

Singh, A., Vaidya, G., Jagota, V., Darko, D. A., Agarwal, R. K., Debnath, S., & Potrich, E. (2022). Recent Advancement in Postharvest Loss Mitigation and Quality Management of Fruits and Vegetables Using Machine Learning Frameworks. Journal of Food Quality, 2022(1), 6447282. https://doi.org/10.1155/2022/6447282

Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028

Swain, S., & Jenamani, M. (2023). Federated Learning for Temperature Break Identification in a Reefer Container from IoT Data. 2023 4th International Conference on Data Analytics for Business and Industry, ICDABI 2023, 671–676. https://doi.org/10.1109/ICDABI60145.2023.10629315

Szeghalmy, S., & Fazekas, A. (2023a). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4). https://doi.org/10.3390/s23042333

Szeghalmy, S., & Fazekas, A. (2023b). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 1–27. https://doi.org/10.3390/s23042333

Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004

Vignolles, M. L., Lopez, C., Le Floch-Fouéré, C., Ehrhardt, J. J., Méjean, S., Jeantet, R., & Schuck, P. (2010). Fat supramolecular structure in fat-filled dairy powders: A tool to adjust spray-drying temperatures. Dairy Science and Technology, 90(2–3), 287–300. https://doi.org/10.1051/dst/2009057

Zeng, W., Jia, J., Zheng, Z., Xie, C., & Guo, L. (2011). A comparison study: Support vector machines for binary classification in machine learning. Proceedings - 2011 4th International Conference on Biomedical Engineering and Informatics, BMEI 2011, 3(Vc), 1621–1625. https://doi.org/10.1109/BMEI.2011.6098517

Zeng, X., & Martinez, T. R. (2000). Distribution-Balanced Strati ed Cross-Validation for Accuracy Estimation 1 Introduction. Journal of Experimental & Theoretical Artificial Intelligence, 12(1), 1–12.

Zhang, R., Zhou, L., Zuo, M., Zhang, Q., Bi, M., Jin, Q., & Xu, Z. (2018). Prediction of Dairy Product Quality Risk Based on Extreme Learning Machine. Proceedings - 2nd International Conference on Data Science and Business Analytics, ICDSBA 2018, 448–456. https://doi.org/10.1109/ICDSBA.2018.00090

Zhang, Y., Zhang, L., Ma, Y., Guan, J., Liu, Z., & Liu, J. (2022). Research on dairy products detection based on machine learning algorithm. MATEC Web of Conferences, 355, 03008. https://doi.org/10.1051/matecconf/202235503008