Sara Detection on Social Media Using Deep Learning Algorithm Development
DOI:
https://doi.org/10.37385/jaets.v6i1.5390Keywords:
Deep Learning, SARA Comments, SARA Detection, SMOTE, Social Media ClassificationAbstract
Social media has become a key platform for disseminating information and opinions, particularly in Indonesia, where SARA (Ethnicity, Religion, Race, and Intergroup) issues can fuel social tensions. To address this, developing an automated system to detect and classify harmful content is essential. This study develops a deep learning model using Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to detect SARA-related comments on Twitter. The method involves data collection through web scraping, followed by cleaning, manual labeling, and text preprocessing. To address data imbalance, SMOTE (Synthetic Minority Over-sampling Technique) is applied, while early stopping prevents overfitting. Model performance is evaluated using precision, recall, and F1-score. The results demonstrate that SMOTE significantly improves model performance, particularly in detecting minority-class SARA comments. CNN+SMOTE achieves a accuracy of 93%, and BiLSTM+SMOTE records a recall of 88%, effectively capturing patterns in SARA and non-SARA data. With SMOTE and early stopping, the model successfully manages class imbalance and reduces overfitting. This research supports efforts to curtail hate speech on social media, especially in the Indonesian context, where SARA-related issues often dominate public discourse.
Downloads
References
Abidin, Z., Junaidi, A. & Wamiliana. (2024). Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review. Journal of Information Systems Engineering and Business Intelligence, 10(2), 217–231. https://doi.org/10.20473/jisebi.10.2.217-231
Adam, A. Z. R. & Setiawan, E. B. (2023). Social Media Sentiment Analysis using Convolutional Neural Network (CNN) dan Gated Recurrent Unit (GRU). Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika (JITEKI), 9(1), 119–131. https://doi.org/10.26555/jiteki.v9i1.25813
Aji, N. B., Kurnianingsih, Masuyama, N. & Nojima, Y. (2024). CNN-LSTM for Heartbeat Sound Classification. International Journal on Informatics Visualization, 8(2), 735–741. https://doi.org/10.62527/joiv.8.2.2115
Anam, M. K., Defit, S., Haviluddin, Efrizoni, L. & Firdaus, M. B. (2024). Early Stopping on CNN-LSTM Development to Improve Classification Performance. Journal of Applied Data Sciences, 5(3), 1175–1188. https://doi.org/10.47738/jads.v5i3.312
Aurora, E., Zahra, A., Sibaroni, Y., Sri, & & Prasetyowati, S. (2023). Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method. JINAV: Journal of Information and Visualization, 4(2), 2746–1440. https://doi.org/10.35877/454RI.jinav1864
Bailey, E. R., Matz, S. C., Youyou, W. & Iyengar, S. S. (2020). Authentic self-expression on social media is associated with greater subjective well-being. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-18539-w
Begum, S. G. & Sree, P. K. (2023). Drug Recommendation Using a “Reviews and Sentiment Analysis” By a Recurrent Neural Network. Indonesian Journal of Multidisciplinary Science, 2(9), 3085–3094. https://doi.org/10.55324/ijoms.v2i9.530
Casero-Ripollés, A. (2021). Influencers in the political conversation on twitter: Identifying digital authority with big data. Sustainability (Switzerland), 13(5), 1–14. https://doi.org/10.3390/su13052851
Chihab, M., Chiny, M., Boussatta, N. M. H., Chihab, Y. & Youssef Hadi, M. (2022). BiLSTM and Multiple Linear Regression based Sentiment Analysis Model using Polarity and Subjectivity of a Text. IJACSA) International Journal of Advanced Computer Science and Applications, 13(10), 436–442. https://doi.org/10.14569/IJACSA.2022.0131052
Dharani, R., Revathy, S. & Danesh, K. (2023). Fuzzy Genetic Particle Swarm Optimization Convolution Neural Network Based on Oral Cancer Identification System. Journal of Applied Engineering and Technological Science, 5(1), 150–169. https://doi.org/10.37385/jaets.v5i1.2874
Fauzy, A. R. I. & Setiawan, E. B. (2023). Detecting Fake News on Social Media Combined with the CNN Methods. JURNAL RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(2), 271–277. https://doi.org/10.29207/resti.v7i1.4889
Herianto, Kurniawan, B., Hartomi, Z. H., Irawan, Y. & Anam, M. K. (2024). Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction. Journal of Applied Data Sciences, 5(3), 1272–1285. https://doi.org/10.47738/jads.v5i3.316
Hsieh, Y. H. & Zeng, X. P. (2022). Sentiment Analysis: An ERNIE-BiLSTM Approach to Bullet Screen Comments. Sensors, 22(14), 1–15. https://doi.org/10.3390/s22145223
Konovalova, E., Le Mens, G. & Schöll, N. (2023). Social media feedback and extreme opinion expression. PLoS ONE, 18(11), 1–20. https://doi.org/10.1371/journal.pone.0293805
Kowsher, M., Tahabilder, A., Sanjid, M. Z. I., Prottasha, N. J., Uddin, M. S., Hossain, M. A. & Jilani, M. A. K. (2021). LSTM-ANN & BiLSTM-ANN: Hybrid deep learning models for enhanced classification accuracy. Procedia Computer Science, 193, 131–140. https://doi.org/10.1016/j.procs.2021.10.013
Lestari, V. B., Utami, E. & Hanafi. (2024). Combining Bi-LSTM And Word2vec Embedding For Sentiment Analysis Models of Application User Reviews. Indonesian Journal of Computer Science, 13(1), 312–326. https://doi.org/10.33022/ijcs.v13i1.3647
Li, X., Lei, Y. & Ji, S. (2022). BERT- and BiLSTM-Based Sentiment Analysis of Online Chinese Buzzwords. Future Internet, 14(11), 1–15. https://doi.org/10.3390/fi14110332
Lyrawati, D. P. N. (2022). Hate Speech Detection on Twitter Approaching The Indonesian Election Using Machine Learning. The Journal on Machine Learning and Computational Intelligence, 2(1), 26–31. https://doi.org/10.26740/vol2iss1y2022id20
Malik, P., Aggrawal, A. & Vishwakarma, D. K. (2021). Toxic Speech Detection using Traditional Machine Learning Models and BERT and fastText Embedding with Deep Neural Networks. Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 1254–1259. https://doi.org/10.1109/ICCMC51019.2021.9418395
Muis, A., Yudhana, A. & Dahlan, A. (2023). Comparison Analysis of Brain Image Classification Based on Thresholding Segmentation With Convolutional Neural Network. Journal of Applied Engineering and Technological Science, 4(2), 664–673. https://doi.org/10.37385/jaets.v4i2.1583
Omran, E., Al Tararwah, E. & Al Qundus, J. (2023). A comparative analysis of machine learning algorithms for hate speech detection in social media. Online Journal of Communication and Media Technologies, 13(4), 1–11. https://doi.org/10.30935/ojcmt/13603
Prathama, N. A., Hasani, M. R. & Akbar, M. I. (2022). SARA Hoax: Phenomena, Meaning, and Conflict Management. Jurnal ASPIKOM, 7(2), 129. https://doi.org/10.24329/aspikom.v7i2.1117
Putra, P. P., Anam, M. K., Defit, S. & Yunianta, A. (2024). Enhancing the Decision Tree Algorithm to Improve Performance Across Various Datasets. INTENSIF: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi, 8(2), 200–212. https://doi.org/10.29407/intensif.v8i2.22280
Putra, R. S., Agustin, W., Anam, M. K., Lusiana, L. & Yaakub, S. (2022). The Application of Naïve Bayes Classifier Based Feature Selection on Analysis of Online Learning Sentiment in Online Media. Jurnal Transformatika, 20(1), 44–56. https://doi.org/10.26623/transformatika.v20i1.5144
Rianto, Mutiara, A. B., Wibowo, E. P. & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00413-1
Rudiyanto, R. A. & Setiawan, E. B. (2024). Sentiment Analysis Using Convolutional Neural Network (CNN) and Particle Swarm Optimization on Twitter. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 9(2), 188–195. https://doi.org/10.33480/jitk.v9i2.5201
Sharmin, T., Di Troia, F., Potika, K. & Stamp, M. (2020). Convolutional neural networks for image spam detection. Information Security Journal, 29(3), 103–117. https://doi.org/10.1080/19393555.2020.1722867
Siddiqui, J. A., Yuhaniz, S. S., Memon, Z. A. & Amin, Y. (2021). Improving Hate Speech Detection Using Machine and Deep Learning Techniques: A Preliminary Study. Open International Journal of Informatics (OIJI), 9(2), 21–34. https://doi.org/10.11113/oiji2021.9nSpecial Issue 2.143
Taradhita, D. A. N. & Putra, I. K. G. D. (2021). Hate speech classification in Indonesian language tweets by using convolutional neural network. Journal of ICT Research and Applications, 14(3), 225–239. https://doi.org/10.5614/itbj.ict.res.appl.2021.14.3.2
Yang, Y. (2023). Application of LSTM Neural Network Technology Embedded in English Intelligent Translation. Computational Intelligence and Neuroscience, 2023, 1–1. https://doi.org/10.1155/2023/9764613