Optimization of Random Forest Model with Correlation-Based Feature Selection for Enhanced Forest Health Prediction

  • Singgih Setia Andiko Department of Informatics Engineering, STMIK Widya Utama Purwokerto
  • Bayu Rizkya Rizkya Pratama Department of Informatics Engineering, STMIK Widya Utama Purwokerto
  • Muhammad Akbar Setiawan Department of Informatics Engineering, STMIK Widya Utama Purwokerto
  • Eldas Puspita Rini Department of Informatics Engineering, STMIK Widya Utama Purwokerto
Keywords: Forest Health Prediction, Random Forest, Feature Selection, Correlation-Based Feature Selection, Ecological Informatics

Abstract

Forest health serves as a key indicator for maintaining ecosystem sustainability and biodiversity. This study aims to predict forest health status using a Random Forest algorithm integrated with Correlation-Based Feature Selection (CFS). The dataset comprises 1,000 samples with 18 attributes—including Disturbance_Level, Fire_Risk_Index, Tree_Height, and Menhinick_Index—along with health status labels categorized into four classes: Unhealthy, Sub-Healthy, Healthy, and Very Healthy. The research methodology encompassed data preprocessing, feature selection using CFS, Random Forest model construction, and performance evaluation. Feature selection identified four key attributes that significantly contributed to forest health prediction. The model was trained on 70% of the data and tested on the remaining 30%, achieving an accuracy of 92%. Further analysis revealed an average precision of 91%, recall of 90%, and F1-score of 90%. The confusion matrix indicated accurate predictions across most categories, though some misclassification occurred in the Sub-Healthy class. This study demonstrates that the CFS-based Random Forest approach is effective for forest health prediction, offering a valuable analytical tool to support conservation efforts and damage risk mitigation.

 

, , , Predictive Modeling

References

C. I. Millar and N. L. Stephenson, “Temperate forest health in an era of emerging megadisturbance,” Science, vol. 349, no. 6250, pp. 823–826, Aug. 2015, doi: 10.1126/science.aaa9933.

S. Gauthier, P. Bernier, T. Kuuluvainen, A. Z. Shvidenko, and D. G. Schepaschenko, “Boreal forest health and global change,” Science, vol. 349, no. 6250, pp. 819–822, Aug. 2015, doi: 10.1126/science.aaa9092.

M. J. Wingfield, E. G. Brockerhoff, B. D. Wingfield, and B. Slippers, “Planted forest health: The need for a global strategy,” Science, vol. 349, no. 6250, pp. 832–836, Aug. 2015, doi: 10.1126/science.aac6674.

S. Trumbore, P. Brando, and H. Hartmann, “Forest health and global change,” Science, vol. 349, no. 6250, pp. 814–818, Aug. 2015, doi: 10.1126/science.aac6759.

G. Feng, M. Fan, and Y. Chen, “Analysis and Prediction of Students’ Academic Performance Based on Educational Data Mining,” IEEE Access, vol. 10, pp. 19558–19571, Jan. 2022, doi: 10.1109/access.2022.3151652.

Y. Mardi, “Data Mining : Klasifikasi Menggunakan Algoritma C4.5,” Edik Informatika, vol. 2, no. 2, pp. 213–219, Feb. 2017, doi: 10.22202/ei.2016.v2i2.1465.

M. Sudais, M. Safwan, M. A. Khalid, and S. Ahmed, “Students’ Academic Performance Prediction Model Using Machine Learning,” Research Square (Research Square), Jan. 2022, doi: 10.21203/rs.3.rs-1296035/v1.

C. S. K and K. S. Kumar, “Data Preprocessing and Visualizations Using Machine Learning for Student Placement Prediction,” 2022 2nd International Conference on Technological Advancements in Computational Sciences (ICTACS), Oct. 2022, doi: 10.1109/ictacs56270.2022.9988247.

P. T. P. P, G. Lumacad, and R. Catrambone, “Predicting Student Performance Using Feature Selection Algorithms for Deep Learning Models,” 2021 XVI Latin American Conference on Learning Technologies (LACLO), vol. 28, pp. 1–7, Oct. 2021, doi: 10.1109/laclo54177.2021.00009.

T. Gori, A. Sunyoto, and H. A. Fatta, “Preprocessing Data dan Klasifikasi untuk Prediksi Kinerja Akademik Siswa,” Jurnal Teknologi Informasi Dan Ilmu Komputer, vol. 11, no. 1, pp. 215–224, Feb. 2024, doi: 10.25126/jtiik.20241118074.

N. Arsad, A. H. Muhammad, and T. Hidayat, “Classification of Mental Disorders Using Modified Balanced Random Forest And Feature Selection,” Jurnal Teknologi Informasi Universitas Lambung Mangkurat (JTIULM), vol. 9, no. 2, pp. 45–54, Oct. 2024, doi: 10.20527/jtiulm.v9i2.320.

L. F. Kholig, S. Supriadi, M. Andri, T. Erviyanti, and V. Oktavianti, “Pembinaan Kesehatan Mental Remaja Di MTS Ngalaban Desa Bendet Kecamatan Diwek Jombang,” Jurnal Pengabdian Masyarakat Darul Ulum, vol. 1, no. 1, pp. 45–51, Jan. 2022, doi: 10.32492/dimas.v1i1.522.

M. K. Dahouda and I. Joe, “A Deep-Learned Embedding Technique for Categorical Features Encoding,” IEEE Access, vol. 9, pp. 114381–114391, Jan. 2021, doi: 10.1109/access.2021.3104357.

H. Henderi, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer,” IJIIS International Journal of Informatics and Information Systems, vol. 4, no. 1, pp. 13–20, Mar. 2021, doi: 10.47738/ijiis.v4i1.73.

J. K. Lubis and I. Kharisudin, “Metode Long Short Term Memory dan Generalized Autoregressive Conditional Heteroscedasticity untuk Pemodelan Data Saham,” Feb. 23, 2021. https://journal.unnes.ac.id/sju/index.php/prisma/article/view/44897

E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry, vol. 15, no. 1, p. 123, Jan. 2023, doi: 10.3390/sym15010123.

H. Zulfiqar, Q.-L. Huang, H. Lv, Z.-J. Sun, F.-Y. Dao, and H. Lin, “Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in Geobacter pickeringii by Using Correlation-Based Feature Selection Technique,” International Journal of Molecular Sciences, vol. 23, no. 3, p. 1251, Jan. 2022, doi: 10.3390/ijms23031251.

N. B. N. Azmi, N. A. Hermawan, and N. D. Avianto, “Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver,” JTIM Jurnal Teknologi Informasi Dan Multimedia, vol. 4, no. 4, pp. 281–290, Feb. 2023, doi: 10.35746/jtim.v4i4.298.

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Jan. 2001, doi: 10.1023/a:1010933404324.

N. A. Riska, N. Purnawansyah, H. Darwis, and W. Astuti, “Studi Perbandingan Kombinasi GMI, HSV, KNN, dan CNN pada Klasifikasi Daun Herbal,” Indonesian Journal of Computer Science, vol. 12, no. 3, Jun. 2023, doi: 10.33022/ijcs.v12i3.3210.

A. Y. Prayoga, A. I. Hadiana, and F. R. Umbara, “Deteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli Naïve Bayes dengan Ekstraksi Fitur Tf-Idf,” Jurnal Syntax Admiration, vol. 2, no. 10, pp. 1808–1823, Oct. 2021, doi: 10.46799/jsa.v2i10.327.

Published
2026-01-28
How to Cite
Andiko, S., Rizkya Pratama, B., Setiawan, M., & Rini, E. (2026). Optimization of Random Forest Model with Correlation-Based Feature Selection for Enhanced Forest Health Prediction. Proceedings of the National Conference on Electrical Engineering, Informatics, Industrial Technology, and Creative Media, 2025(1), 35-41. https://doi.org/10.20895/centive.v2025i1.516