Impact of Outliers on Regression Models Performance: A Comparative Analysis of Diabetes Data

نوع المستند : المقالة الأصلية

المؤلف

Higher Institute of Management in EL Mahalla El-Kubra

10.21608/cfdj.2025.363449.2188

المستخلص

This study used a dataset of 150 diabetic patients from Kafr El-Sheikh, Egypt, collected between 2000 and 2024, to examine the impact of outliers on the performance of different regression models: OLS, RR, QR, and SVR. Outliers were addressed using the Trimmed Mean method, and performance was evaluated using R², MSE, and MAE. The results showed that the OLS model was the most sensitive to outliers, while the QR model was relatively robust. For example, the MSE for the SVR model decreased by 50.59% upon removing outliers, whereas the changes in RR and QR were less significant. Without outliers, the RR model achieved the highest R² value (0.8618), and the QR model had the lowest MSE (0.9875) and MAE (0.9072). These findings highlight the critical need to carefully select regression techniques and outlier handling methods, even with seemingly robust models like QR, to ensure valid and reliable statistical inferences. Future research should explore alternative outlier handling methods, investigate the causes of outliers, and develop data and model-specific outlier treatment strategies.

الكلمات الرئيسية