Performance Improvement of Quality Monitoring Systems in Imbalanced Data Conditions for Fat-Filled Powder Quality in The Dairy Industry
DOI:
https://doi.org/10.37385/jaets.v7i1.6996Keywords:
Quality monitoring, Dairy industry, Imbalanced data, Machine learning, Synthetic data manipulationAbstract
Fat-filled powder has the potential to substitute milk in meeting the nutritional needs of the community, but its product quality remains unstable during continuous production processes. A key challenge in fat-filled powder (FFP) production is the difficulty in quality monitoring, which is influenced by various uncertainty factors that affect product quality. Machine learning can be implemented for quality monitoring system, but the imbalanced data conditions require the development of algorithms with optimal performance. This study aims to design a quality monitoring system for FFP using a machine learning model under imbalanced dataset conditions and the influence of other uncertainty factors. A Random Forest (RF) machine learning model was developed for monitoring FFP quality. In the context of imbalanced datasets, the model was optimized through various scenarios, including data splitting for training and testing, as well as the Synthetic Minority Oversampling Technique (SMOTE) and Distribution Optimally Balanced – Stratified Cross Validation (DOB-SCV) schemes. The results showed that the SMOTE model achieved the best performance in terms of accuracy, precision, and recall with scores of 99.67%, 99.79%, and 99.24%, respectively, on the testing data. Statistically, the RF model with the SMOTE data manipulation scenario also showed significant differences compared to the DOB-SCV model and the traditional data splitting approach. The quality monitoring model for FFP developed in this study can be implemented in the dairy industry, offering more stable, accurate quality monitoring predictions that align with real conditions, helping to avoid quality uncertainties during the production process. The implementation of this model in the industry has the potential to facilitate a broader, more transparent, and optimized product quality evaluation process, which can also be conducted in real time under continuous production conditions.
Downloads
References
Anggoro, D. A., & Mukti, S. S. (2021). Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure. International Journal of Intelligent Engineering and Systems, 14(6), 198–207. https://doi.org/10.22266/ijies2021.1231.19
Bahel, V., Pillai, S., & Malhotra, M. (2020). A Comparative Study on Various Binary Classification Algorithms and their Improved Variant for Optimal Performance. 2020 IEEE Region 10 Symposium, TENSYMP 2020, January 2020, 495–498. https://doi.org/10.1109/TENSYMP50017.2020.9230877
Bhagat, R. C., & Patil, S. S. (2015). Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest. Souvenir of the 2015 IEEE International Advance Computing Conference, IACC 2015, 403–408. https://doi.org/10.1109/IADCC.2015.7154739
Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1109/ICCECE51280.2021.9342376
Burnaev, E., Erofeev, P., & Papanov, A. (2015). Influence of resampling on accuracy of imbalanced classification. Eighth International Conference on Machine Vision (ICMV 2015), 9875(Icmv), 987521. https://doi.org/10.1117/12.2228523
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(2002), 321–357. https://doi.org/10.46880/jmika.vol4no1.pp67-72
Cockburn, M. (2020). Review: Application and prospective discussion of machine learning for the management of dairy farms. Animals, 10(9), 1–22. https://doi.org/10.3390/ani10091690
Deniz, A., Kiziloz, H. E., Dokeroglu, T., & Cosar, A. (2017). Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques. Neurocomputing, 241, 128–146. https://doi.org/10.1016/j.neucom.2017.02.033
Elreedy, D., Atiya, A. F., & Kamalov, F. (2024). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4
Feng, Z., Liu, D., Gu, J., & Zheng, L. (2024). Raman spectroscopy and fusion machine learning algorithm: A novel approach to identify dairy fraud. Journal of Food Composition and Analysis, 129, 106090. https://doi.org/10.1016/J.JFCA.2024.106090
Finnegan, E. W., Mahomud, M. S., Murphy, E. G., & O’Mahony, J. A. (2021). The influence of pre-heat treatment of skim milk on key quality attributes of fat filled milk powder made therefrom. International Journal of Dairy Technology, 74(2), 404–413. https://doi.org/10.1111/1471-0307.12758
Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42–47. http://www.ijetae.com/files/Volume2Issue4/IJETAE_0412_07.pdf
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. https://doi.org/10.1016/J.ESWA.2016.12.035
Han, J., Li, T., He, Y., & Gao, Q. (2022). Using Machine Learning Approaches for Food Quality Detection. Mathematical Problems in Engineering, 2022(1), 6852022. https://doi.org/10.1155/2022/6852022
Kaitlin, Trace, Smith, & Sadler, B. (2020). Identifying new X-ray binary candidates in M31 using random forest classification. Monthly Notices of the Royal Astronomical Society, 492(4), 5075–5088. https://doi.org/10.1093/MNRAS/STAA207
Kene Ejeahalaka, K., & On, S. L. W. (2020). Effective detection and quantification of chemical adulterants in model fat-filled milk powders using NIRS and hierarchical modelling strategies. Food Chemistry, 309, 125785. https://doi.org/10.1016/J.FOODCHEM.2019.125785
Kokkinos, K., Papageorgiou, E., Dafopoulos, V., & Ioannis, A. (2017). Efficiency in Energy Decision Support Systems Using Soft Computing Techniques. In A. Kumar, S. Ajith, A. Patrick, & S. Michael (Eds.), Intelligent Decision Support Systems for Sustainable Computing (pp. 33–52). Springer Nature.
Kumar, A., & Agrawal, S. (2024). Enhancing quality-based classification of perishable products: a convolutional neural network approach with statistical hyperparameter optimization. Multimedia Tools and Applications, 1–24. https://doi.org/10.1007/S11042-024-19700-Z/METRICS
Liu, F., & Dai, Y. (2022). Product Processing Quality Classification Model for Small-Sample and Imbalanced Data Environment. Computational Intelligence and Neuroscience, 2022, 1–16. https://doi.org/10.1155/2022/9024165
Lopez, V., Fernandez, A., & Herrera, F. (2014). On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Information Sciences, 257, 1-13. https://doi.org/10.1016/j.ins.2013.09.038
Mahmudah, K. R., Indriani, F., Takemori-Sakai, Y., Iwata, Y., Wada, T., & Satou, K. (2021). Classification of imbalanced data represented as binary features. Applied Sciences, 11(17), 7825. https://doi.org/10.3390/app11177825
Mendez, K. M., Reinke, S. N., & Broadhurst, D. I. (2019). A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics, 15(12), 1-15. https://doi.org/10.1007/s11306-019-1612-4
Moreno-Torres, J. G., Saez, J. A., & Herrera, F. (2012). Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1304-1312. https://doi.org/10.1109/TNNLS.2012.2199516
Mu, F., Gu, Y., Zhang, J., & Zhang, L. (2020). Milk source identification and milk quality estimation using an electronic nose and machine learning techniques. Sensors, 20(15), 1-14. https://doi.org/10.3390/s20154238
Mujahid, M., Kina, E., Rustam, F., Gracia Villar, M., Alvarado, E. S., De I., Diez, L. T., & Ashraf, I. (2024). Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering. Journal of Big Data, 11, 87. https://doi.org/10.1186/s40537-024-00943-4
Murphy, S. I., Reichler, S. J., Martin, N. H., Boor, K. J., & Wiedmann, M. (2021). Machine learning and advanced statistical modeling can identify key quality management practices that affect postpasteurization contamination of fluid milk. Journal of Food Protection, 84(9), 1496-1511. https://doi.org/10.4315/JFP-20-431
Naik, N., & Purohit, S. (2017). Comparative Study of Binary Classification Methods to Analyze a Massive Dataset on Virtual Machine. Procedia Computer Science, 112(2017), 1863–1870. https://doi.org/10.1016/j.procs.2017.08.232
Pounds, K., Bao, H., Luo, Y., De, J., Schneider, K., Correll, M., & Tong, Z. (2022). Real-Time and Rapid Food Quality Monitoring Using Smart Sensory Films with Image Analysis and Machine Learning. ACS Food Science and Technology, 2(7), 1123–1134. https://doi.org/10.1021/acsfoodscitech.2c00124
Purwantiningsih, T. I., Bria, M. A. B., & Kia, K. W. (2022). Levels Protein and Fat of Yoghurt Made of Different Types and Number of Cultures. Journal of Tropical Animal Science and Technology, 4(1), 66–73. https://doi.org/10.32938/jtast.v4i1.967
Qin, W., Zhuang, Z., Guo, L., & Sun, Y. (2022). A hybrid multi-class imbalanced learning method for predicting the quality level of diesel engines. Journal of Manufacturing Systems, 62(April), 846–856. https://doi.org/10.1016/j.jmsy.2021.03.014
Shekar, B. H., & Dagnew, G. (2019). Grid search-based hyperparameter tuning and classification of microarray cancer data. 2019 2nd International Conference on Advanced Computational and Communication Paradigms, ICACCP 2019, November, 1–8. https://doi.org/10.1109/ICACCP.2019.8882943
Singh, A., Vaidya, G., Jagota, V., Darko, D. A., Agarwal, R. K., Debnath, S., & Potrich, E. (2022). Recent Advancement in Postharvest Loss Mitigation and Quality Management of Fruits and Vegetables Using Machine Learning Frameworks. Journal of Food Quality, 2022(1), 6447282. https://doi.org/10.1155/2022/6447282
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028
Swain, S., & Jenamani, M. (2023). Federated Learning for Temperature Break Identification in a Reefer Container from IoT Data. 2023 4th International Conference on Data Analytics for Business and Industry, ICDABI 2023, 671–676. https://doi.org/10.1109/ICDABI60145.2023.10629315
Szeghalmy, S., & Fazekas, A. (2023a). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4). https://doi.org/10.3390/s23042333
Szeghalmy, S., & Fazekas, A. (2023b). A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. Sensors, 23(4), 1–27. https://doi.org/10.3390/s23042333
Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004
Vignolles, M. L., Lopez, C., Le Floch-Fouéré, C., Ehrhardt, J. J., Méjean, S., Jeantet, R., & Schuck, P. (2010). Fat supramolecular structure in fat-filled dairy powders: A tool to adjust spray-drying temperatures. Dairy Science and Technology, 90(2–3), 287–300. https://doi.org/10.1051/dst/2009057
Zeng, W., Jia, J., Zheng, Z., Xie, C., & Guo, L. (2011). A comparison study: Support vector machines for binary classification in machine learning. Proceedings - 2011 4th International Conference on Biomedical Engineering and Informatics, BMEI 2011, 3(Vc), 1621–1625. https://doi.org/10.1109/BMEI.2011.6098517
Zeng, X., & Martinez, T. R. (2000). Distribution-Balanced Strati ed Cross-Validation for Accuracy Estimation 1 Introduction. Journal of Experimental & Theoretical Artificial Intelligence, 12(1), 1–12.
Zhang, R., Zhou, L., Zuo, M., Zhang, Q., Bi, M., Jin, Q., & Xu, Z. (2018). Prediction of Dairy Product Quality Risk Based on Extreme Learning Machine. Proceedings - 2nd International Conference on Data Science and Business Analytics, ICDSBA 2018, 448–456. https://doi.org/10.1109/ICDSBA.2018.00090
Zhang, Y., Zhang, L., Ma, Y., Guan, J., Liu, Z., & Liu, J. (2022). Research on dairy products detection based on machine learning algorithm. MATEC Web of Conferences, 355, 03008. https://doi.org/10.1051/matecconf/202235503008


CITEDNESS IN SCOPUS
CITEDNESS IN WOS




