Comparative Analysis of Random Forest and Support Vector Machine for Classifying Pima Indians Diabetes Dataset

Johanes Eka Priyatma; Mikael Raditya Agung Sasmita

Call for Papers August 2026 | editor@uijrt.com | ISSN: 2582-6832 | Google Scholar | Impact Factor: 5.794

Paper Details

Subject:	Computer Science and Engineering
Paper ID:	UIJRTV6I90012
Volume:	06
Issue:	09
Pages:	117-126
Date:	July 2025
ISSN:	2582-6832
Statistics:

Full Text [PDF]

Cite this

Johanes Eka Priyatma and Mikael Raditya Agung Sasmita. (July 2025). Comparative Analysis of Random Forest and Support Vector Machine for Classifying Pima Indians Diabetes Dataset. United International Journal for Research & Technology (UIJRT), 6(9), 117-126.

Abstract

This study explored how well two machine learning algorithms—Random Forest (RF) and Support Vector Machine (SVM)—performed in classifying the Pima Indians Diabetes Dataset, which is used to predict the likelihood of individuals developing diabetes. To ensure a fair and reliable comparison, both models were evaluated using 10-fold cross-validation. Their effectiveness was measured through key classification metrics: accuracy, precision, recall, and F1-score. The results highlighted Random Forest as the more stable and reliable model, achieving an average accuracy of 76.3% and consistently strong results across all folds. In contrast, while the SVM with a polynomial kernel delivered slightly better precision (74.57%), it fell short in terms of overall accuracy, recall, and F1-score when compared to Random Forest. Ultimately, Random Forest proved to be better at identifying true positive cases and handling variations in the data, making it a stronger candidate for classifying health-related datasets like this one. That said, with further tuning of its parameters, SVM still holds promise as a competitive alternative.

Keywords: Random Forest, Support Vector Machine, Diabetes Classification, Pima Dataset, Machine Learning.