Sara Detection on Social Media Using Deep Learning Algorithm Development

M. Khairul Anam; Lucky Lhaura Van FC; Hamdani Hamdani; Rahmaddeni Rahmaddeni; Junadhi Junadhi; Muhammad Bambang Firdaus; Irwanda Syahputra; Yuda Irawan

doi:10.37385/jaets.v6i1.5390

Authors

M. Khairul Anam Universitas Samudra
Lucky Lhaura Van FC Universitas Lancang Kuning
Hamdani Hamdani Universitas Mulawarman
Rahmaddeni Rahmaddeni Universitas Sains dan Teknologi Indonesia
Junadhi Junadhi Universitas Sains dan Teknologi Indonesia
Muhammad Bambang Firdaus Universitas Mulawarman
Irwanda Syahputra Universitas Samudra
Yuda Irawan Universitas Hang Tuah Pekanbaru

DOI:

https://doi.org/10.37385/jaets.v6i1.5390

Keywords:

Deep Learning, SARA Comments, SARA Detection, SMOTE, Social Media Classification

Abstract

Social media has become a key platform for disseminating information and opinions, particularly in Indonesia, where SARA (Ethnicity, Religion, Race, and Intergroup) issues can fuel social tensions. To address this, developing an automated system to detect and classify harmful content is essential. This study develops a deep learning model using Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to detect SARA-related comments on Twitter. The method involves data collection through web scraping, followed by cleaning, manual labeling, and text preprocessing. To address data imbalance, SMOTE (Synthetic Minority Over-sampling Technique) is applied, while early stopping prevents overfitting. Model performance is evaluated using precision, recall, and F1-score. The results demonstrate that SMOTE significantly improves model performance, particularly in detecting minority-class SARA comments. CNN+SMOTE achieves a accuracy of 93%, and BiLSTM+SMOTE records a recall of 88%, effectively capturing patterns in SARA and non-SARA data. With SMOTE and early stopping, the model successfully manages class imbalance and reduces overfitting. This research supports efforts to curtail hate speech on social media, especially in the Indonesian context, where SARA-related issues often dominate public discourse.

Downloads

Download data is not yet available.

References

Abidin, Z., Junaidi, A. & Wamiliana. (2024). Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review. Journal of Information Systems Engineering and Business Intelligence, 10(2), 217–231. https://doi.org/10.20473/jisebi.10.2.217-231

Adam, A. Z. R. & Setiawan, E. B. (2023). Social Media Sentiment Analysis using Convolutional Neural Network (CNN) dan Gated Recurrent Unit (GRU). Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika (JITEKI), 9(1), 119–131. https://doi.org/10.26555/jiteki.v9i1.25813

Aji, N. B., Kurnianingsih, Masuyama, N. & Nojima, Y. (2024). CNN-LSTM for Heartbeat Sound Classification. International Journal on Informatics Visualization, 8(2), 735–741. https://doi.org/10.62527/joiv.8.2.2115

Anam, M. K., Defit, S., Haviluddin, Efrizoni, L. & Firdaus, M. B. (2024). Early Stopping on CNN-LSTM Development to Improve Classification Performance. Journal of Applied Data Sciences, 5(3), 1175–1188. https://doi.org/10.47738/jads.v5i3.312

Aurora, E., Zahra, A., Sibaroni, Y., Sri, & & Prasetyowati, S. (2023). Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method. JINAV: Journal of Information and Visualization, 4(2), 2746–1440. https://doi.org/10.35877/454RI.jinav1864

Bailey, E. R., Matz, S. C., Youyou, W. & Iyengar, S. S. (2020). Authentic self-expression on social media is associated with greater subjective well-being. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-18539-w

Begum, S. G. & Sree, P. K. (2023). Drug Recommendation Using a “Reviews and Sentiment Analysis” By a Recurrent Neural Network. Indonesian Journal of Multidisciplinary Science, 2(9), 3085–3094. https://doi.org/10.55324/ijoms.v2i9.530

Casero-Ripollés, A. (2021). Influencers in the political conversation on twitter: Identifying digital authority with big data. Sustainability (Switzerland), 13(5), 1–14. https://doi.org/10.3390/su13052851

Chihab, M., Chiny, M., Boussatta, N. M. H., Chihab, Y. & Youssef Hadi, M. (2022). BiLSTM and Multiple Linear Regression based Sentiment Analysis Model using Polarity and Subjectivity of a Text. IJACSA) International Journal of Advanced Computer Science and Applications, 13(10), 436–442. https://doi.org/10.14569/IJACSA.2022.0131052

Dharani, R., Revathy, S. & Danesh, K. (2023). Fuzzy Genetic Particle Swarm Optimization Convolution Neural Network Based on Oral Cancer Identification System. Journal of Applied Engineering and Technological Science, 5(1), 150–169. https://doi.org/10.37385/jaets.v5i1.2874

Fauzy, A. R. I. & Setiawan, E. B. (2023). Detecting Fake News on Social Media Combined with the CNN Methods. JURNAL RESTI (Rekayasa Sistem Dan Teknologi Informasi), 7(2), 271–277. https://doi.org/10.29207/resti.v7i1.4889

Herianto, Kurniawan, B., Hartomi, Z. H., Irawan, Y. & Anam, M. K. (2024). Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction. Journal of Applied Data Sciences, 5(3), 1272–1285. https://doi.org/10.47738/jads.v5i3.316

Hsieh, Y. H. & Zeng, X. P. (2022). Sentiment Analysis: An ERNIE-BiLSTM Approach to Bullet Screen Comments. Sensors, 22(14), 1–15. https://doi.org/10.3390/s22145223

Konovalova, E., Le Mens, G. & Schöll, N. (2023). Social media feedback and extreme opinion expression. PLoS ONE, 18(11), 1–20. https://doi.org/10.1371/journal.pone.0293805

Kowsher, M., Tahabilder, A., Sanjid, M. Z. I., Prottasha, N. J., Uddin, M. S., Hossain, M. A. & Jilani, M. A. K. (2021). LSTM-ANN & BiLSTM-ANN: Hybrid deep learning models for enhanced classification accuracy. Procedia Computer Science, 193, 131–140. https://doi.org/10.1016/j.procs.2021.10.013

Lestari, V. B., Utami, E. & Hanafi. (2024). Combining Bi-LSTM And Word2vec Embedding For Sentiment Analysis Models of Application User Reviews. Indonesian Journal of Computer Science, 13(1), 312–326. https://doi.org/10.33022/ijcs.v13i1.3647

Li, X., Lei, Y. & Ji, S. (2022). BERT- and BiLSTM-Based Sentiment Analysis of Online Chinese Buzzwords. Future Internet, 14(11), 1–15. https://doi.org/10.3390/fi14110332

Lyrawati, D. P. N. (2022). Hate Speech Detection on Twitter Approaching The Indonesian Election Using Machine Learning. The Journal on Machine Learning and Computational Intelligence, 2(1), 26–31. https://doi.org/10.26740/vol2iss1y2022id20

Malik, P., Aggrawal, A. & Vishwakarma, D. K. (2021). Toxic Speech Detection using Traditional Machine Learning Models and BERT and fastText Embedding with Deep Neural Networks. Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 1254–1259. https://doi.org/10.1109/ICCMC51019.2021.9418395

Muis, A., Yudhana, A. & Dahlan, A. (2023). Comparison Analysis of Brain Image Classification Based on Thresholding Segmentation With Convolutional Neural Network. Journal of Applied Engineering and Technological Science, 4(2), 664–673. https://doi.org/10.37385/jaets.v4i2.1583

Omran, E., Al Tararwah, E. & Al Qundus, J. (2023). A comparative analysis of machine learning algorithms for hate speech detection in social media. Online Journal of Communication and Media Technologies, 13(4), 1–11. https://doi.org/10.30935/ojcmt/13603

Prathama, N. A., Hasani, M. R. & Akbar, M. I. (2022). SARA Hoax: Phenomena, Meaning, and Conflict Management. Jurnal ASPIKOM, 7(2), 129. https://doi.org/10.24329/aspikom.v7i2.1117

Putra, P. P., Anam, M. K., Defit, S. & Yunianta, A. (2024). Enhancing the Decision Tree Algorithm to Improve Performance Across Various Datasets. INTENSIF: Jurnal Ilmiah Penelitian Dan Penerapan Teknologi Sistem Informasi, 8(2), 200–212. https://doi.org/10.29407/intensif.v8i2.22280

Putra, R. S., Agustin, W., Anam, M. K., Lusiana, L. & Yaakub, S. (2022). The Application of Naïve Bayes Classifier Based Feature Selection on Analysis of Online Learning Sentiment in Online Media. Jurnal Transformatika, 20(1), 44–56. https://doi.org/10.26623/transformatika.v20i1.5144

Rianto, Mutiara, A. B., Wibowo, E. P. & Santosa, P. I. (2021). Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00413-1

Rudiyanto, R. A. & Setiawan, E. B. (2024). Sentiment Analysis Using Convolutional Neural Network (CNN) and Particle Swarm Optimization on Twitter. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 9(2), 188–195. https://doi.org/10.33480/jitk.v9i2.5201

Sharmin, T., Di Troia, F., Potika, K. & Stamp, M. (2020). Convolutional neural networks for image spam detection. Information Security Journal, 29(3), 103–117. https://doi.org/10.1080/19393555.2020.1722867

Siddiqui, J. A., Yuhaniz, S. S., Memon, Z. A. & Amin, Y. (2021). Improving Hate Speech Detection Using Machine and Deep Learning Techniques: A Preliminary Study. Open International Journal of Informatics (OIJI), 9(2), 21–34. https://doi.org/10.11113/oiji2021.9nSpecial Issue 2.143

Taradhita, D. A. N. & Putra, I. K. G. D. (2021). Hate speech classification in Indonesian language tweets by using convolutional neural network. Journal of ICT Research and Applications, 14(3), 225–239. https://doi.org/10.5614/itbj.ict.res.appl.2021.14.3.2

Yang, Y. (2023). Application of LSTM Neural Network Technology Embedded in English Intelligent Translation. Computational Intelligence and Neuroscience, 2023, 1–1. https://doi.org/10.1155/2023/9764613

Sara Detection on Social Media Using Deep Learning Algorithm Development

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Current Issue

Information

Developed By