Optimizing Healthcare Claim Fraud Detection Using Ensemble Learning Models and Modified SMOTE in Imbalanced Dataset with Application to the Egyptian Health Insurance

Mohamed Saad Hussein, Asmaa

doi:10.21608/cfdj.2025.395116.2296

Optimizing Healthcare Claim Fraud Detection Using Ensemble Learning Models and Modified SMOTE in Imbalanced Dataset with Application to the Egyptian Health Insurance

نوع المستند : المقالة الأصلية

المؤلف

Asmaa Mohamed Saad Hussein

Insurance and Actuarial Science, faculty of commerce, Cairo university

10.21608/cfdj.2025.395116.2296

المستخلص

Healthcare fraud involves deliberately submitting inaccurate claims or distorting information to receive payment benefits. This results in the misallocation of healthcare funds and drives up overall healthcare costs. As a result, fraud represents a significant financial burden. Therefore, the machine learning Machine Learning (ML) Algorithms and Artificial Intelligence (AI) have essential role in detecting healthcare insurance fraud.
Detecting claims fraud in healthcare insurance datasets is a significant challenge due to severe class imbalance, where fraudulent cases are vastly outnumbered by legitimate ones. Traditional machine learning (ML) algorithms often underperform in such scenarios because they tend to favor the majority (non-fraud) class, leading to poor fraud detection rates. To address this issue, resampling techniques—such as oversampling the minority class (fraud) or under sampling the majority class (non-fraud)—are commonly employed to balance the dataset.
This study proposes a health model that helps in detecting health claim fraud in the Egyptian health market based on the ensemble techniques, including the XGBoost, Random Forest, bagging algorithms. This paper is the first paper uses the XGBOOST algorithm and tuning hyperparamters to optimize the performance accuracy of the classifier model especially, for the imbalanced data in the Egyptian Health Market. Additionally, this research used modified (SMOTE) algorithm to address the imbalance data issue that enhances the performance of the fraud claim detection model. This study is the first study that conducts comparison among different performance metrics, including 𝐴𝑈𝐶-𝑅𝑂𝐶, 𝐹1-Score, Precision, and Recall for different ensemble classifiers models and the logistic regression in Egyptian Health Insurance Data. The findings show the effectiveness of SMOTE in building a more robust model for detecting the health claims frauds and prevention Techniques, reducing potential losses and enhancing the overall performance. Moreover, the ensemble learning technique greatly outperforms single learning algorithms (Logistic Regression) based on different performance metrics.

الكلمات الرئيسية

المجلة العلمية للدراسات والبحوث المالية والتجارية

المجلد 6، العدد 2 - الرقم المسلسل للعدد 1
المجلد السادس؛ العدد الثاني؛ الجزء الأول؛ يوليو 2025
يوليو 2025
الصفحة 1265-1294

مشاهدة على SCiNiTO

عدد المشاهدات للمقالة: 60
تنزیل PDF: 49

Optimizing Healthcare Claim Fraud Detection Using Ensemble Learning Models and Modified SMOTE in Imbalanced Dataset with Application to the Egyptian Health Insurance

المجلد 6، العدد 2 - الرقم المسلسل للعدد 1
المجلد السادس؛ العدد الثاني؛ الجزء الأول؛ يوليو 2025
يوليو 2025
الصفحة 1265-1294

ملفات

شارك

إرسال الاستشهاد إلى

الإحصائيات

Optimizing Healthcare Claim Fraud Detection Using Ensemble Learning Models and Modified SMOTE in Imbalanced Dataset with Application to the Egyptian Health Insurance

المجلد 6، العدد 2 - الرقم المسلسل للعدد 1المجلد السادس؛ العدد الثاني؛ الجزء الأول؛ يوليو 2025يوليو 2025 الصفحة 1265-1294

ملفات

شارك

إرسال الاستشهاد إلى

الإحصائيات

المجلد 6، العدد 2 - الرقم المسلسل للعدد 1
المجلد السادس؛ العدد الثاني؛ الجزء الأول؛ يوليو 2025
يوليو 2025
الصفحة 1265-1294