Performance Comparison of Breast Cancer Classification Methods: Naive Bayes vs. Support Vector Machine

Tutus Pandam Pradipta; Sri Huning Anwariningsih

doi:10.20895/centive.v2025i1.536

Tutus Pandam Pradipta Universitas Sahid Surakarta
Sri Huning Anwariningsih Universitas Sahid Surakarta

DOI: https://doi.org/10.20895/centive.v2025i1.536

Abstract

Breast cancer is a global health issue where early detection and accurate diagnosis play a key role in improving patients' chances of successful recovery. Despite their widespread use and proven effectiveness, traditional diagnostic methods have limitations that have prompted the development of computational approaches. Machine learning is one such approach. Numerous prior studies have investigated various algorithms, including Naive Bayes and Support Vector Machine (SVM), for breast cancer classification; however, research directly comparing their performance on the same dataset is still limited. This study evaluates the efficacy of Naive Bayes and SVM methods for classifying breast cancer diagnoses as benign or malignant using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The research stages include data collection, preprocessing, splitting the dataset into training and test sets at 70% to 30%, standardizing features for the SVM model, applying both algorithms, and evaluating performance using metrics such as accuracy, precision, recall, and F1-score. The test results indicate that the SVM algorithm achieved an accuracy of 98.25%, precision of 100%, recall of 95%, F1-score of 98%, and MCC of 0.96. Conversely, the Naive Bayes algorithm achieved 94.15% accuracy, 94% precision, 91% recall, a 93% F1-score, and 0.88 MCC. The comparison results indicate that SVM outperforms Naive Bayes on this dataset, especially in reducing false- positive and false-negative rates. This research is expected to serve as a valuable resource for medical professionals and researchers seeking to select the appropriate machine learning algorithm for early breast cancer detection.

References

E. I. Obeagu and G. U. Obeagu, ‘Breast Cancer A Review of Risk Factors and Diagnosis’, Medicine (Baltimore)., vol. 103, no. 3, pp. 1–6, 2024, doi: 10.1097/MD.0000000000036905.

F. Bray et al., ‘Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries’, CA. Cancer J. Clin., vol. 74, no. 3, pp. 229–263, 2024, doi: 10.3322/caac.21834.

X. Xiong et al., ‘Breast Cancer: Pathogenesis and Treatments’, Signal Transduct. Target. Ther., vol. 10, no. 49, pp. 1–33, 2025, doi: 10.1038/s41392- 024-02108-4.

W. Gautama, ‘Breast Cancer in Indonesia in 2022: 30 Years of Marching in Place’, Indones. J. Cancer, vol. 16, no. 1, pp. 1–2, 2022, doi: 10.33371/ijoc.v16i1.920.

E. Marfianti, ‘Peningkatan Pengetahuan Kanker Payudara dan Ketrampilan Periksa Payudara Sendiri (SADARI) untuk Deteksi Dini Kanker Payudara di Semutan Jatimulyo Dlingo’, J. Abdimas Madani dan Lestari, vol. 3, no. 1, pp. 25– 31, 2021, doi: https://doi.org/10.20885/jamali.vol3.iss1.art4.

J. Boutry et al., ‘The evolution and ecology of benign tumors’, Biochim. Biophys. Acta - Rev. Cancer, vol. 18864, no. 1, 2022, doi: 10.1016/j.bbcan.2021.188643.

A. La Moglia and K. M. Almustafa, ‘Breast cancer prediction using machine learning classification algorithms’, Intell. Med., vol. 11, no. 100193, 2025, doi: 10.1016/j.ibmed.2024.100193.

M. A. Naji, S. El Filali, K. Aarika, E. H. Benlahmar, R. A. Abdelouhahid, and O. Debauche, ‘Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis’, in Procedia Computer Science, Elsevier B.V., 2021, pp. 487–492. doi: 10.1016/j.procs.2021.07.062.

E. Arar and F. H. Halicioglu, ‘Understanding Artificial Neural Networks as a Transformative Approach to Construction Risk Management: A Systematic Literature Review’, Buildings, vol. 15, pp. 33–46, 2025, doi: 10.3390/buildings15183346.

O. Peretz, M. Koren, and O. Koren, ‘Naive Bayes classifier – An ensemble procedure for recall and precision enrichment’, Eng. Appl. Artif. Intell., vol. 136, no. 108972, pp. 1–12, 2024, doi: 10.1016/j.engappai.2024.108972.

R. Sitepu, ‘The Analysis of Support Vector Machine (SVM) on Monthly Covid-19 Case Classification’, Int. J. Inf. Commun. Technol., vol. 8, no. 2, pp. 40–52, 2022, doi: 10.21108/ijoict.v8i2.671.

O. P. Barus, K. Lauwren, J. J. Pangaribuan, and Romindo, ‘Implementation of the Naive Bayes Algorithm to Predict the Safety of Heart Failure Patients’, in IAIC International Conference Series, 2023, pp. 172–177. doi: 10.34306/conferenceseries.v4i1.651.

E. Rosida, A. Firmansyah, and Suherman, ‘Comparative Analysis of Classification of K- Nearest Neighbor (KNN) Algorithm and Decision Tree in Breast Cancer Using Rapidminer’, in International Journal of Applied Research and Sustainable Sciences (IJARSS), Jakarta, 2024, pp. 1039–1060. doi: 10.59890/ijarss.v2i12.48.

A. B. Siddik, F. R. Badal, and A. Islam, ‘Comparative Performance of Machine Learning Algorithms for Early Genetic Disorder and Subclass Classification’, pp. 1–16, 2024, [Online]. Available: http://arxiv.org/abs/2412.02189

A. Faradibah, D. Widyawati, A. U. T. Syahar, S. R. Jabir, and P. L. L. Belluano, ‘Comparison Analysis of Random Forest Classifier, Support Vector Machine, and Artificial Neural Network Performance in Multiclass Brain Tumor Classification’, Indones. J. Data Sci., vol. 4, no. 2, pp. 54–63, 2023, doi: 10.56705/ijodas.v4i2.73.

P. J. B. Pajila, B. G. Sheena, A. Gayathri, J. Aswini, M. Nalini, and R. Siva Subramanian, ‘A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications’, in Proceedings of the 4th International Conference on Smart Electronics and Communication, ICOSEC 2023, IEEE, 2023, pp. 1228–1234. doi: 10.1109/ICOSEC58147.2023.10276274.

N. Cahyani, R. Irsyada, and A. Y. Kartini, ‘Implementasi Machine Learning Model sebagai Sistem Prediksi Penyakit Breast Cancer’, Digit. Transform. Technol., vol. 4, no. 2, pp. 1112–1120, 2025, doi: 10.47709/digitech.v4i2.5209.

A. Gholamy, V. Kreinovich, and O. Kosheleva, ‘Why 70 / 30 or 80 / 20 Relation Between Training and Testing Sets : A Pedagogical’, 2018. [Online]. Available: https://scholarworks.utep.edu/cs_techrep

H. Gandhi, K. P. S. Attwal, and M. Lal, ‘A Comparison of Machine Learning Techniques for Prediction of Survival using Titanic Dataset’, Int.J. Comput. Artif. Intell., vol. 6, no. 1, pp. 208–214, 2025, doi: 10.33545/27076571.2025.v6.i1c.156.

P. Tsirtsakis, G. Zacharis, G. S. Maraslidis, and G. F. Fragulis, ‘Deep Learning for Object Recognition: A Comprehensive Review of Models and Algorithms’, Int. J. Cogn. Comput. Eng., vol. 6, pp. 298–312, 2025, doi: 10.1016/j.ijcce.2025.01.004.

D. Bansal, K. Khanna, R. Chhikara, R. K. Dua, and R. Malhotra, ‘Classification of Magnetic Resonance Images using Bag of Features for Detecting Dementia’, Procedia Comput. Sci., vol. 167, no. 2019, pp. 131–137, 2020, doi: 10.1016/j.procs.2020.03.190.

L. Xu, Y. Guo, J. Li, J. Yu, and H. Xu, ‘Classification of autism spectrum disorder based on fluctuation entropy of spontaneous hemodynamic fluctuations’, Biomed. Signal Process. Control, vol. 60, p. 101958, 2020, doi: 10.1016/j.bspc.2020.101958.

A. Sharma et al., ‘An accurate artificial intelligence system for the detection of pulmonary and extrapulmonary Tuberculosis’, Tuberculosis, vol. 131, no. November, p. 102143, 2021, doi: 10.1016/j.tube.2021.102143.

G. Alwakid and W. Gouda, ‘Deep Learning-Based Prediction of Diabetic Retinopathy Using CLAHE and ESRGAN for Enhancement’, pp. 1–17, 2023.

D. Chicco, M. J. Warrens, and G. Jurman, ‘The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment’, IEEE Access, vol. 9, pp. 78368–78381, 2021, doi: 10.1109/ACCESS.2021.3084050.

Performance Comparison of Breast Cancer Classification Methods: Naive Bayes vs. Support Vector Machine

Abstract

References

Event Schedule