Optimizing Healthcare Claim Fraud Detection Using Ensemble Learning Models and Modified SMOTE in Imbalanced Dataset with Application to the Egyptian Health Insurance

نوع المستند : المقالة الأصلية

المؤلف

Insurance and Actuarial Science, faculty of commerce, Cairo university

المستخلص

Healthcare fraud involves deliberately submitting inaccurate claims or distorting information to receive payment benefits. This results in the misallocation of healthcare funds and drives up overall healthcare costs. As a result, fraud represents a significant financial burden. Therefore, the machine learning Machine Learning (ML) Algorithms and Artificial Intelligence (AI) have essential role in detecting healthcare insurance fraud. 
     Detecting claims fraud in healthcare insurance datasets is a significant challenge due to severe class imbalance, where fraudulent cases are vastly outnumbered by legitimate ones. Traditional machine learning (ML) algorithms often underperform in such scenarios because they tend to favor the majority (non-fraud) class, leading to poor fraud detection rates. To address this issue, resampling techniques—such as oversampling the minority class (fraud) or under sampling the majority class (non-fraud)—are commonly employed to balance the dataset.
    This study proposes a health model that helps in detecting health claim fraud in the Egyptian health market based on the ensemble techniques, including the XGBoost, Random Forest, bagging algorithms. This paper is the first paper uses the XGBOOST algorithm and tuning hyperparamters to optimize the performance accuracy of the classifier model especially, for the imbalanced data in the Egyptian Health Market. Additionally, this research used modified (SMOTE) algorithm to address the imbalance data issue that enhances the performance of the fraud claim detection model.  This study is the first study that conducts comparison among different performance metrics, including 𝐴𝑈𝐶-𝑅𝑂𝐶, 𝐹1-Score, Precision, and Recall for different ensemble classifiers models and the logistic regression in Egyptian Health Insurance Data. The findings show the effectiveness of SMOTE in building a more robust model for detecting the health claims frauds and prevention Techniques, reducing potential losses and enhancing the overall performance. Moreover, the ensemble learning technique greatly outperforms single learning algorithms (Logistic Regression) based on different performance metrics.

الكلمات الرئيسية