Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF)

  • Sintia Sintia UPI YTPK Padang
  • Sarjon Defit UPI YPTK Padang
  • Gunadi Widi Nurcahyo UPI YPTK Padang
Keywords: TF-IDF, Cosine Similarity, Term Frequency, Invers Document Frequency, Search Accuracy

Abstract

In the SiPaGa application, the codefication search process is still inaccurate, so OPD often make mistakes in choosing goods codes. So we need Cosine Similarity and TF-IDF methods that can improve the accuracy of the search. Cosine Similarity is a method for calculating similarity by using keywords from the code of goods. Term Frequency and Inverse Document (TFIDF) is a way to give weight to a one-word relationship (term). The purpose of this research is to improve the accuracy of the search for goods codification. Codification of goods processed in this study were 14,417 data sourced from the Goods and Price Planning Information System (SiPaGa) application database. The search keywords were processed using the Cosine Similarity method to see the similarities and using TF-IDF to calculate the weighting. This research produces the calculation of cosine similarity and TF-IDF weighting and is expected to be applied to the SiPaGa application so that the search process on the SiPaGa application is more accurate than before. By using the cosine sismilarity algorithm and TF-IDF, it is hoped that it can improve the accuracy of the search for product codification. So that OPD can choose the product code as desired

References

Amrizal, V. (2018). Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim). Jurnal Teknik Informatika. https://doi.org/10.15408/jti.v11i2.8623

Arroyo-Fernández, I., Méndez-Cruz, C. F., Sierra, G., Torres-Moreno, J. M., & Sidorov, G. (2019). Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech and Language. https://doi.org/10.1016/j.csl.2019.01.005

Charlet, D., & Damnati, G. (2018). SimBow at SemEval-2017 Task 3: Soft-Cosine Semantic Similarity between Questions for Community Question Answering. https://doi.org/10.18653/v1/s17-2051

Deviyanto, A., & Wahyudi, M. D. R. (2018). PENERAPAN ANALISIS SENTIMEN PADA PENGGUNA TWITTER MENGGUNAKAN METODE K-NEAREST NEIGHBOR. JISKA (Jurnal Informatika Sunan Kalijaga). https://doi.org/10.14421/jiska.2018.31-01

Hafeez, S., & Patil, B. (2017). Using Explicit Semantic Similarity for an Improved Web Explorer with ontology and TF-IDF. International Journal Of Advance Scientific Research And Engineering Trends Using.

Kharismadita, P., & Rahutomo, F. (2017). Implementasi Tokenizing Plus Pada Sistem Pendeteksi Kemiripan Jurnal SkripsI. Jurnal Informatika Polinema, 2(1), 24. https://doi.org/10.33795/jip.v2i1.50

Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-Centric Computing and Information Sciences. https://doi.org/10.1186/s13673-019-0192-7

Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., & Yang, Q. (2018). Cosine normalization: Using cosine similarity instead of dot product in neural networks. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-01418-6_38

Naf’an, M. Z., Burhanuddin, A., & Riyani, A. (2019). Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen. Jurnal Linguistik Komputasional (JLK). https://doi.org/10.26418/jlk.v2i1.17

Nkisi-Orji, I., Wiratunga, N., Massie, S., Hui, K. Y., & Heaven, R. (2019). Ontology alignment based on word embedding and random forest classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-030-10925-7_34

Nurdiansyah, Y., Andrianto, A., & Kamshal, L. (2019). New book classification based on Dewey Decimal Classification (DDC) law using tf-idf and cosine similarity method. Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/1211/1/012044

Putra, Randi Rian, C. W. (2018). IMPLEMENTASI DATA MINING PEMILIHAN PELANGGAN POTENSIAL MENGGUNAKANة. IEEE Communications Surveys and Tutorials. https://doi.org/10.1109/COMST.2015.2457491

Putra, R. R., Wadisman, C., Sains, F., Teknologi, D., Pembangunan, U., & Medan, P. B. (2018). IMPLEMENTASI DATA MINING PEMILIHAN PELANGGAN POTENSIAL MENGGUNAKAN ALGORITMA K-MEANS IMPLEMENTATION OF DATA MINING FOR POTENTIAL CUSTOMER SELECTION USING K-MEANS ALGORITHM. Journal of Information Technology and Computer Science.

Rozeva, A., & Zerkova, S. (2017). Assessing semantic similarity of texts - Methods and algorithms. AIP Conference Proceedings. https://doi.org/10.1063/1.5014006

Sejati, F. B., Hendradi, P., & Pujiarto, B. (2019). Deteksi Plagiarisme Karya Ilmiah Dengan Pemanfaatan Daftar Pustaka Dalam Pencarian Kemiripan Tema Menggunakan Metode Cosine Similarity (Studi Kasus: Di Universitas Muhammadiyah Magelang). Jurnal Komtika. https://doi.org/10.31603/komtika.v2i2.2594

Siregar, R. R. A., Sinaga, F. A., & Arianto, R. (2017). Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model. Computatio : Journal of Computer Science and Information Systems. https://doi.org/10.24912/computatio.v1i2.1014

Thongtan, T., & Phienthrakul, T. (2019). Sentiment classification using document embeddings trained with cosine similarity. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop. https://doi.org/10.18653/v1/p19-2057

Yasni, L., Subroto, I. M. I., & Haviana, S. F. C. (2018). Implementasi Cosine Similarity Matching Dalam Penentuan Dosen Pembimbing Tugas Akhir. Transmisi. https://doi.org/10.14710/transmisi.20.1.22-28

Zhu, Z., Liang, J., Li, D., Yu, H., & Liu, G. (2019). Hot Topic Detection Based on a Refined TF-IDF Algorithm. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2893980

Published
2021-05-09
How to Cite
Sintia, S., Defit, S., & Nurcahyo, G. W. (2021). Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF) . Journal of Applied Engineering and Technological Science (JAETS), 2(2), 62 - 69. https://doi.org/10.37385/jaets.v2i2.210
Abstract viewed = 33 times
PDF downloaded = 17 times