Conference Proceedings
The International Conference on Emerging Technologies in Electronics, Computing and Communication 2022
(ICETECC`22)
Enhancing Classification Accuracy for Diabetes Detection using optimal feature combination
Naqash Ahmad1*; Urooj Abid1; Muhammad Talha2; Osama Zulfiqar1; Mubashir Shah1; Noman Naseer1;1Department of Mechatronics and Biomedical Engineering Air University Islamabad, Pakistan 2Department of Computer Science Islamia University Bahawalpur Bahawlpur, Pakistan |
ABSTRACT
- This research presents a comprehensive comparative analysis of several machine learning models applied to the task of predicting diabetes based on the well-known diabetes dataset. The dataset that is used is an open- source dataset from Kaggle having two classes, eight features and 768 samples. Among the predefined eight features (pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, and age) a pool of different features was tested and optimal feature combination is identified for diabetic detection based on evaluation matrices such as accuracy, precision, recall and F-1 score. The machine learning models investigated include k-Nearest Neighbors, Naive Bayes, Linear Discriminant Analysis, Decision Tree, Random Forest, Artificial Neural Network and Support Vector Machine. The results indicate that combination containing skin thickness and insulin as features shows lowest accuracies while combinations with age, BMI and glucose shows highest accuracies. Over all the optimal feature combination of age, BMI, glucose, blood pressure, diabetes pedigree function and pregnancies showed highest accuracy of 93.5% using Random Forest with significance p<0.005. This research provides optimal feature combination with optimal classifier for diabetic detection.