LEVERAGING DEEP LEARNING FOR EARLY DETECTION OF DIABETES MELLITUS
Abstract
Diabetes Mellitus is a chronic and life-threatening disease that poses a significant global health challenge. While machine learning (ML) models have been widely adopted for predicting diabetes incidence, a critical research gap remains: most models function as "black boxes," prioritizing overall prediction accuracy over the identification and interpretation of the key underlying risk factors. This study addresses this gap by developing and evaluating an ensemble machine learning framework designed not only for high-accuracy detection but also for the critical task of identifying and ranking the most significant attributes contributing to diabetes onset. Utilizing a dataset of clinical and demographic features, we trained and tested multiple models, including XGBoost, Random Forest, and Support Vector Machines. Our proposed ensemble model achieved a superior accuracy of 87% and demonstrated high precision, outperforming existing benchmark models. Furthermore, through feature importance analysis using SHAP (SHapley Additive exPlanations) values, we identified 2-3 top factors, e.g., glucose level, BMI, age as the most salient predictors. The findings provide actionable insights for healthcare providers, enabling targeted prevention strategies and data-driven interventions for at-risk populations, thereby moving beyond mere prediction towards actionable, preventative healthcare.
Keywords
Support Vector Machine, Random Forest, Diabetics, Artificial Intelligence