Standardscaler's Potential in Enhancing Breast Cancer Accuracy Using Machine Learning
DOI:
https://doi.org/10.37385/jaets.v5i1.3080Keywords:
Breast Cancer, Logistic Regression, Machine Learning, StandardScalerAbstract
The major consequence of breast cancer is death. It has been proven in many studies that machine learning techniques are more efficient in diagnosing breast cancer. These algorithms have also been used to estimate a person's likelihood of surviving breast cancer. In this study, we employed machine learning algorithms to predict breast cancer. A total of 569 breast cancer datasets were obtained from kaggle sites. Some of the machine learning algorithms that we use are K-Nearest Neighbor (KNN), besides Random Forest (RF), there is also Gradient Boosting (GB), then Gaussian Naive Bayes (GNB), Vector Support Machine (SVM), and then Logistic Regression (LR). Before algorithms were used to train and test breast cancer datasets, StandardScaler was leveraged to transform training datasets and test datasets for improved algorithm performance. As a result of this utilization, the performance measurement carried out succeeded in producing high accuracy. The highest results were obtained from the Logistic Regression algorithm with an accuracy value of 99%. The value of precison is 99% benign, and 100% malignant. The recall results are 100% benign, and 98% malignant. The F1-Score results show 99% benign, and 99% malignant. It is hoped that this research can help the medical party to determine the next step in dealing with breast cancer.
Downloads
References
Abdulhay, E., Mohammed, M. A., Ibrahim, D. A., Arunkumar, N., & Venkatraman, V. (2018). Computer Aided Solution for Automatic Segmenting and Measurements of Blood Leucocytes Using Static Microscope Images. Journal of Medical Systems, 42(4), 1–12. https://doi.org/10.1007/S10916-018-0912-Y/METRICS
Ali, L., Wajahat, I., Amiri Golilarz, N., Keshtkar, F., & Bukhari, S. A. C. (2021). LDA–GA–SVM: improved hepatocellular carcinoma prediction through dimensionality reduction and genetically optimized support vector machine. Neural Computing and Applications, 33(7), 2783–2792. https://doi.org/10.1007/S00521-020-05157-2/METRICS
Amin, S. A., Al Shanabari, H., Iqbal, R., & Karyotis, C. (2023). An Intelligent Framework for Automatic Breast Cancer Classification Using Novel Feature Extraction and Machine Learning Techniques. Journal of Signal Processing Systems, 95(2–3), 293–303. https://doi.org/10.1007/S11265-022-01753-8/METRICS
Amrane, M., Oukid, S., Gagaoua, I., & Ensari, T. (2018). Breast cancer classification using machine learning. 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting, EBBT 2018, 1–4. https://doi.org/10.1109/EBBT.2018.8391453
Ara, S., Das, A., & Dey, A. (2021). Malignant and Benign Breast Cancer Classification using Machine Learning Algorithms. 2021 International Conference on Artificial Intelligence, ICAI 2021, 97–101. https://doi.org/10.1109/ICAI52203.2021.9445249
Assiri, A. S., Nazir, S., & Velastin, S. A. (2020). Breast Tumor Classification Using an Ensemble Machine Learning Method. Journal of Imaging 2020, Vol. 6, Page 39, 6(6), 39. https://doi.org/10.3390/JIMAGING6060039
Atban, F., Ekinci, E., & Garip, Z. (2023). Traditional machine learning algorithms for breast cancer image classification with optimized deep features. Biomedical Signal Processing and Control, 81, 104534. https://doi.org/10.1016/J.BSPC.2022.104534
Bao, S., He, H., Wang, F., Wu, H., & Wang, H. (2019). PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 85–96. https://doi.org/10.18653/v1/2020.acl-main.9
Bayrak, E. A., Kirci, P., & Ensari, T. (2019). Comparison of machine learning methods for breast cancer diagnosis. 2019 Scientific Meeting on Electrical-Electronics and Biomedical Engineering and Computer Science, EBBT 2019. https://doi.org/10.1109/EBBT.2019.8741990
Bhanushali, A., Sivagnanam, K., Singh, K., Mittapally, B. K., Reddi, L. T., & Bhanushali, P. (2023). Analysis of Breast Cancer Prediction Using Multiple Machine Learning Methodologies. International Journal of Intelligent Systems and Applications in Engineering, 11(3), 1077–1084. https://ijisae.org/index.php/IJISAE/article/view/3367
Botlagunta, M., Botlagunta, M. D., Myneni, M. B., Lakshmi, D., Nayyar, A., Gullapalli, J. S., & Shah, M. A. (2023). Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports 2023 13:1, 13(1), 1–17. https://doi.org/10.1038/s41598-023-27548-w
Breast Cancer Wisconsin Diagnostic Dataset | Kaggle. (n.d.). Retrieved June 1, 2023, from https://www.kaggle.com/datasets/utkarshx27/breast-cancer-wisconsin-diagnostic-dataset
Budholiya, K., Shrivastava, S. K., & Sharma, V. (2022). An optimized XGBoost based diagnostic system for effective prediction of heart disease. Journal of King Saud University - Computer and Information Sciences, 34(7), 4514–4523. https://doi.org/10.1016/J.JKSUCI.2020.10.013
Chakraborty, S., Aich, S., & Kim, H. C. (2019). A Secure Healthcare System Design Framework using Blockchain Technology. International Conference on Advanced Communication Technology, ICACT, 2019-February, 260–264. https://doi.org/10.23919/ICACT.2019.8701983
Cios, K. J., & William Moore, G. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1–2), 1–24. https://doi.org/10.1016/S0933-3657(02)00049-0
de Amorim, L. B. V., Cavalcanti, G. D. C., & Cruz, R. M. O. (2023). The choice of scaling technique matters for classification performance. Applied Soft Computing, 133, 109924. https://doi.org/10.1016/J.ASOC.2022.109924
Egwom, O. J., Hassan, M., Tanimu, J. J., Hamada, M., & Ogar, O. M. (2022a). An LDA–SVM Machine Learning Model for Breast Cancer Classification. BioMedInformatics 2022, Vol. 2, Pages 345-358, 2(3), 345–358. https://doi.org/10.3390/BIOMEDINFORMATICS2030022
Egwom, O. J., Hassan, M., Tanimu, J. J., Hamada, M., & Ogar, O. M. (2022b). An LDA–SVM Machine Learning Model for Breast Cancer Classification. BioMedInformatics 2022, Vol. 2, Pages 345-358, 2(3), 345–358. https://doi.org/10.3390/BIOMEDINFORMATICS2030022
Ertu?rul, Ö. F., & Ta?luk, M. E. (2017). A novel version of k nearest neighbor: Dependent nearest neighbor. Applied Soft Computing, 55, 480–490. https://doi.org/10.1016/J.ASOC.2017.02.020
Fadilah, D., Putri, A., Putu, L., & Yuliastuti, S. (2022). Effect of health education using demonstration media for breast self-examination motivation for women in preventing breast cancer. Jurnal Pijar Mipa, 17(5), 679–682. https://doi.org/10.29303/JPM.V17I5.3993
Faramarzi, A., Jahromi, M. G., Jalilian, N., Golestan Jahromi, M., & Ashourzadeh, S. (2021). Metastatic and pathophysiological characteristics of breast cancer with emphasis on hereditary factors Improving Human Sperm Culture Medium View project Metastatic and pathophysiological characteristics of breast cancer with emphasis on hereditary factors. Central Asian Journal of Medical and Pharmaceutical Sciences Innovation, 3, 104–113. https://doi.org/10.22034/CAJMPSI.2021.03.01
G, T. R., Bhattacharya, S., Maddikunta, P. K. R., Hakak, S., Khan, W. Z., Bashir, A. K., Jolfaei, A., & Tariq, U. (2022). Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset. Multimedia Tools and Applications, 81(29), 41429–41453. https://doi.org/10.1007/S11042-020-09988-Y/METRICS
Hazra, R., Banerjee, M., & Badia, L. (2020). Machine Learning for Breast Cancer Classification with ANN and Decision Tree. 11th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2020, 522–527. https://doi.org/10.1109/IEMCON51383.2020.9284936
Hughes, D. T., Reyes-Gastelum, D., Ward, K. C., Hamilton, A. S., & Haymart, M. R. (2022). Barriers to the use of active surveillance for thyroid cancer: Results of a physician survey. Annals of Surgery, 276(1), e40. https://doi.org/10.1097/SLA.0000000000004417
Jabbar, M. A. (2021). Breast Cancer Data Classification Using Ensemble Machine Learning. Engineering and Applied Science Research, 48(1), 65–72. https://doi.org/10.14456/easr.2021.8
Khandezamin, Z., Naderan, M., & Rashti, M. J. (2020). Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier. Journal of Biomedical Informatics, 111, 103591. https://doi.org/10.1016/J.JBI.2020.103591
Klein, E. A., Richards, D., Cohn, A., Tummala, M., Lapham, R., Cosgrove, D., Chung, G., Clement, J., Gao, J., Hunkapiller, N., Jamshidi, A., Kurtzman, K. N., Seiden, M. V., Swanton, C., & Liu, M. C. (2021). Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Annals of Oncology, 32(9), 1167–1177. https://doi.org/10.1016/J.ANNONC.2021.05.806
Kumar, U. K., Nikhil, M. B. S., & Sumangali, K. (2017). Prediction of breast cancer using voting classifier technique. 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials, ICSTM 2017 - Proceedings, 108–114. https://doi.org/10.1109/ICSTM.2017.8089135
Laghmati, S., Cherradi, B., Tmiri, A., Daanouni, O., & Hamida, S. (2020). Classification of Patients with Breast Cancer using Neighbourhood Component Analysis and Supervised Machine Learning Techniques. 3rd International Conference on Advanced Communication Technologies and Networking, CommNet 2020. https://doi.org/10.1109/COMMNET49926.2020.9199633
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/J.GLTP.2022.04.020
Mamdouh Farghaly, H., Shams, M. Y., & Abd El-Hafeez, T. (2023). Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt. Knowledge and Information Systems, 65(6), 2595–2617. https://doi.org/10.1007/S10115-023-01851-4/TABLES/7
Manikandan, P., Durga, U., & Ponnuraja, C. (2023). An integrative machine learning framework for classifying SEER breast cancer. Scientific Reports 2023 13:1, 13(1), 1–12. https://doi.org/10.1038/s41598-023-32029-1
Medin, D. L., & Smith, E. E. (1981). Strategies and classification learning. Journal of Experimental Psychology: Human Learning and Memory, 7(4), 241–253. https://doi.org/10.1037/0278-7393.7.4.241
Mekha, P., & Teeyasuksaet, N. (2019). Deep learning algorithms for predicting breast cancer based on tumor cells. ECTI DAMT-NCON 2019 - 4th International Conference on Digital Arts, Media and Technology and 2nd ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering, 343–346. https://doi.org/10.1109/ECTI-NCON.2019.8692297
Monirujjaman Khan, M., Islam, S., Sarkar, S., Ayaz, F. I., Ananda, M. K., Tazin, T., Albraikan, A. A., & Almalki, F. A. (2022). Machine Learning Based Comparative Analysis for Breast Cancer Prediction. Journal of Healthcare Engineering, 2022. https://doi.org/10.1155/2022/4365855
Nozomi, I., Aldi, F., & Sentosa, R. B. (2022). Views on Deep Learning for Medical Image Diagnosis. Journal of Applied Engineering and Technological Science (JAETS), 4(1), 547–553. https://doi.org/10.37385/JAETS.V4I1.1367
Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019). Machine Learning Classification Techniques for Breast Cancer Diagnosis. IOP Conference Series: Materials Science and Engineering, 495(1), 012033. https://doi.org/10.1088/1757-899X/495/1/012033
Rabiei, R., Ayyoubzadeh, S. M., Sohrabei, S., Esmaeili, M., & Atashi, A. (2022). Prediction of Breast Cancer using Machine Learning Techniques. ACM International Conference Proceeding Series, March, 382–387. https://doi.org/10.1145/3549206.3549274
Rizwan, A., Iqbal, N., Ahmad, R., & Kim, D. H. (2021). WR-SVM Model Based on the Margin Radius Approach for Solving the Minimum Enclosing Ball Problem in Support Vector Machine Classification. Applied Sciences 2021, Vol. 11, Page 4657, 11(10), 4657. https://doi.org/10.3390/APP11104657
Safdar, S., Rizwan, M., Gadekallu, T. R., Javed, A. R., Rahmani, M. K. I., Jawad, K., & Bhatia, S. (2022). Bio-Imaging-Based Machine Learning Algorithm for Breast Cancer Detection. Diagnostics 2022, Vol. 12, Page 1134, 12(5), 1134. https://doi.org/10.3390/DIAGNOSTICS12051134
Sengar, P. P., Gaikwad, M. J., & Nagdive, A. S. (2020). Comparative study of machine learning algorithms for breast cancer prediction. Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, 796–801. https://doi.org/10.1109/ICSSIT48917.2020.9214267
Shiri Harzevili, N., & Alizadeh, S. H. (2018). Mixture of latent multinomial naive Bayes classifier. Applied Soft Computing, 69, 516–527. https://doi.org/10.1016/J.ASOC.2018.04.020
Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management, ICACTM 2019, 593–596. https://doi.org/10.1109/ICACTM.2019.8776800
Singh, L. K., Khanna, M., & Singh, R. (2023). Artificial intelligence based medical decision support system for early and accurate breast cancer prediction. Advances in Engineering Software, 175, 103338. https://doi.org/10.1016/J.ADVENGSOFT.2022.103338
Svetnik, V., Liaw, A., Tong, C., Christopher Culberson, J., Sheridan, R. P., & Feuston, B. P. (2003). Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958. https://doi.org/10.1021/CI034160G/SUPPL_FILE/CI034160GSI20031008_041202.ZIP
Tazin, T., Sarker, S., Gupta, P., Ayaz, F. I., Islam, S., Monirujjaman Khan, M., Bourouis, S., Idris, S. A., & Alshazly, H. (2021). A Robust and Novel Approach for Brain Tumor Classification Using Convolutional Neural Network. Computational Intelligence and Neuroscience, 2021. https://doi.org/10.1155/2021/2392395
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of Clinical Epidemiology, 49(11), 1225–1231. https://doi.org/10.1016/S0895-4356(96)00002-9
van der Niet, A. G., & Bleakley, A. (2021). Where medical education meets artificial intelligence: ‘Does technology care?’ Medical Education, 55(1), 30–36. https://doi.org/10.1111/MEDU.14131
Vos, T., Abajobir, A. A., Abbafati, C., Abbas, K. M., Abate, K. H., Abd-Allah, F., Abdulle, A. M., Abebo, T. A., Abera, S. F., Aboyans, V., Abu-Raddad, L. J., Ackerman, I. N., Adamu, A. A., Adetokunboh, O., Afarideh, M., Afshin, A., Agarwal, S. K., Aggarwal, R., Agrawal, A., … Murray, C. J. L. (2017). Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. The Lancet, 390(10100), 1211–1259. https://doi.org/10.1016/S0140-6736(17)32154-2
Wang, Y., Wang, B., Tu, N., & Geng, J. (2020). Seismic trace interpolation for irregularly spatial sampled data using convolutional autoencoder. Geophysics, 85(2), V119–V130. https://doi.org/10.1190/GEO2018-0699.1
Widiana, I. K., & Irawan, H. (2020). Clinical and Subtypes of Breast Cancer in Indonesia. Asian Pacific Journal of Cancer Care, 5(4), 281–285. https://doi.org/10.31557/APJCC.2020.5.4.281-285
Wu, J., & Hicks, C. (2021). Breast Cancer Type Classification Using Machine Learning. Journal of Personalized Medicine 2021, Vol. 11, Page 61, 11(2), 61. https://doi.org/10.3390/JPM11020061
Yadavendra, & Chand, S. (2020). A comparative study of breast cancer tumor classification by classical machine learning methods and deep learning method. Machine Vision and Applications, 31(6), 1–10. https://doi.org/10.1007/S00138-020-01094-1/METRICS
Yang, J., Xiu, P., Sun, L., Ying, L., & Muthu, B. (2022). Social media data analytics for business decision making system to competitive analysis. Information Processing & Management, 59(1), 102751. https://doi.org/10.1016/J.IPM.2021.102751
Zhang, Licheng, & Zhan, C. (2017). Machine Learning in Rock Facies Classification: An Application of XGBoost. Global Meeting Abstracts, 1371–1374. https://doi.org/10.1190/IGC2017-351
Zhang, Lihao, Li, C., Peng, D., Yi, X., He, S., Liu, F., Zheng, X., Huang, W. E., Zhao, L., & Huang, X. (2022). Raman spectroscopy and machine learning for the classification of breast cancers. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 264, 120300. https://doi.org/10.1016/J.SAA.2021.120300
Zhang, Z., Jiang, T., Li, S., & Yang, Y. (2018). Automated feature learning for nonlinear process monitoring – An approach using stacked denoising autoencoder and k-nearest neighbor rule. Journal of Process Control, 64, 49–61. https://doi.org/10.1016/J.JPROCONT.2018.02.004