Optimization of Random Forest Model via GridSearchCV for Hoax News Detection

Lutvi Riyandari; Singgih Setia Andiko; Siti Delimasari; Singgih Briandoko

doi:10.20895/centive.v2025i1.545

Lutvi Riyandari Department of Informatics Engineering, STMIK Widya Utama Purwokerto
Singgih Setia Andiko Department of Informatics Engineering, STMIK Widya Utama Purwokerto
Siti Delimasari Department of Informatics Engineering, STMIK Widya Utama Purwokerto
Singgih Briandoko Department of Informatics Engineering, STMIK Widya Utama Purwokerto

DOI: https://doi.org/10.20895/centive.v2025i1.545

Keywords: Hoax News, Text Classification, Random Forest, GridSearchCV

Abstract

In this time of fast digital information growth, information sources can be helpful or harmful. The internet makes it easier for people to find information, but it also makes it easier for fake news and hoaxes to spread quickly and widely. This work seeks to combat the dissemination of false news in the digital age by employing text categorization through the Random Forest algorithm, coupled with hyperparameter optimization via Grid SearchCV.The dataset comprises both hoax and authentic news from Indonesia, subjected to various steps including text processing (case folding, tokenization, and stopword elimination) and feature weighting via TF-IDF.The study’s results reveal that the Random Forest model does an impressive job of telling the difference between fake and real news when tested using a confusion matrix. The confusion matrix shows that the model works better after hyperparameter tweaking with GridSearchCV. This is shown by the fact that the number of accurate predictions (TN and TP) goes up and the number of wrong predictions (FP and FN) goes down. The evaluation measures (accuracy, recall, precision, and F1-Score) also demonstrate significant improvements, increasing from 96% to 97%.

References

G. Gumilar, “LITERASI MEDIA: CERDAS MENGGUNAKAN MEDIA SOSIAL DALAM MENANGGULANGI BERITA PALSU (HOAX) OLEH SISWA SMA.” https://www.semanticscholar.org/paper/LITERASI-MEDIA% 3A-CERDAS-MENGGUNAKAN-MEDIA-SOSIAL-SMA-Gumilar/

a02d3c3ff46f7234c3870cad039a100ccf287a1, 2017. [Accessed 29-01-2026].

A. Guess, J. Nagler, and J. Tucker, “Less than you think: Prevalence and predictors of fake news dissemination on facebook,” Sci. Adv., vol. 5, p. eaau4586, Jan. 2019.

G. Pennycook and D. G. Rand, “Assessing the effect of ’disputed’ warnings and source salience on perceptions of fake news accuracy,” SSRN Electron. J., 2017.

S. Kemp, “Digital 2021: the latest insights into the ‘state of digital.” https://wearesocial.com/uk/blog/2021/ 01/digital-2021-the-latest-insights-into-the-state-of-digital/, 2024. [Accessed 29-01-2026].

D. Sarkar, Text analytics with python. Berlin, Germany: APress, 1 ed., Dec. 2016.

F. Hutter, L. Kotthoff, and J. Vanschoren, eds., Automatic machine learning. The Springer Series on Challenges in Machine Learning, Basel, Switzerland: Springer International Publishing, 1 ed., Mar. 2019.

G. A. Lujan-Moreno, P. R. Howard, O. G. Rojas, and D. C. Montgomery, “Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study,” Expert Syst. Appl., vol. 109, pp. 195–205, Nov. 2018.

N. G. Ramadhan, F. D. Adhinata, A. J. T. Segara, and D. P. Rakhmadani, “Deteksi berita palsu menggunakan metode random forest dan logistic regression,” Jur. Ris. Kom., vol. 9, p. 251, Apr. 2022.

N. Widjiyati, “Implementasi algoritme random forest pada klasifikasi dataset credit approval,” J. Janitra Inform. Sis. Inf., vol. 1, pp. 1–7, Apr. 2021.

I. I. Sholikhah, A. T. J. Harjanta, and K. Latifah, “Machine learning untuk deteksi berita hoax menggunakan bert,” in Prosiding Seminar Nasional Informatika, vol. 1, pp. 524–531, 2023.

N. Agustina, A. Adrian, and M. Hermawati, “Implementasi algoritma na¨ıve bayes classifier untuk mendeteksi berita palsu pada sosial media,” Fakt. Exacta, vol. 14, p. 206, Jan. 2022.

A. Yodi Prayoga, A. Id Hadiana, and F. Rakhmat Umbara, “Deteksi hoax pada berita online bahasa inggris menggunakan bernoulli na¨ıve bayes dengan ekstraksi fitur Tf-Idf,” Jurnal Syntax Admiration, vol. 2, pp. 1808–1823, Oct. 2021.

E. I. Setiawan, S. Johanes, A. T. Hermawan, and Y. Yamasari, “Deteksi validitas berita pada media sosial twitter dengan algoritma naive bayes,” Journal of Intelligent System and Computation, vol. 3, pp. 55–60, Oct. 2021.

S. Nurohanisah, R. Astuti, and F. Muhammad Basysyar, “DETEKSI BERITA PALSU MENGGUNAKAN ALGORITMA RANDOM FOREST,” jati, vol. 8, pp. 422–428, Feb. 2024.

T. A. Roshinta, E. Kumala, and I. F. Dinata, “Sistem deteksi berita hoax berbahasa indonesia bidang kesehatan,” remik, vol. 7, pp. 1167–1173, Apr. 2023.

A. Zahra and M. N. Fauzan, “Sistem identifikasi “fake news” menggunakan metode multinomial na¨ıve bayes,” J. Sist. Dan Teknol. Inf. (JustIN), vol. 10, p. 489, Dec. 2022.

A. R. Hanum, I. A. Zetha, S. C. Putri, R. A. Wulandari, S. P. Andina, J. N. Fajrina, and N. Yudistira, “Analisis kinerja algoritma klasifikasi teks bert dalam mendeteksi berita hoaks,” Jurnal teknologi informasi dan ilmu komputer, vol. 11, pp. 537–546, July 2024.

T. Misriati and R. Aryanti, “Optimalisasi random forest dan support vector machine dengan hyperparameter gridsearchcv untuk analisis sentimen ulasan primaku,” Journal of Information System Research (JOSH), vol. 5, pp. 1333–1341, Jul. 2024.

P. D. Utami and R. Sari, “Filtering hoax menggunakan naive bayes classifier,” MULTINETICS, vol. 4, p. 57, May 2018.

F. Rahutomo, I. Y. R. Pratiwi, and D. M. Ramadhani, “Eksperimen na¨ıve bayes pada deteksi berita hoax berbahasa indonesia,” Jurnal PKOP, vol. 23, July 2019.

A. B. Prasetijo, R. R. Isnanto, D. Eridani, Y. A. A. Soetrisno, M. Arfan, and A. Sofwan, “Hoax detection system on indonesian news sites based on text classification using svm and sgd,” in 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE),

pp. 45–49, 2017.

P. Anandhi and D. E. Nathiya, “Application of linear regression with their advantages, disadvantages, assumption and limitations,” Int. J. Stat. Appl. Math., vol. 8, pp. 133–137, Nov. 2023.

R. T. Wahyuni, D. Prastiyanto, and E. Supraptono, “Penerapan algoritma cosine similarity dan pembobotan tf-idf pada sistem klasifikasi dokumen skripsi,” Jurnal Teknik Elektro, vol. 9, no. 1, pp. 18–23, 2017.

Optimization of Random Forest Model via GridSearchCV for Hoax News Detection

Abstract

References

Event Schedule