Using Some Data Mining Techniques with Application on Insurance Data

نوع المستند : المقالة الأصلية

المؤلف

Faculty of Commerce Benha University

المستخلص

Fraud considered as the most common problem in insurance companies.
Detecting frauds is a difficult problem for insurance companies. This study presents
a statistical and data mining techniques. The statistical and data mining techniques
helps in predicting fraud in this data. The data was cleaned and pre-processed by
removing duplication, filling the missing data, managing the categorical data by
label encoding and detecting the outliers. Then the data was split into train and test
data. After that, using the standardization feature scaling for the data. Finally, the
data was evaluated by some data mining models and the best two models are the
Adaptive Boost and Gradient Boost. The Ada Boost model achieves the highest
values of accuracy (95.556%), recall (92.308%), precision (87.805%), F1_score
(90%) and MCC (Matthews Correlation Coefficient) (87.190%). the Gradient
Boost model achieves the second highest values of accuracy (92.778%), recall
(76.923%), precision (88.235%), F1_score (82.192%) and Matthews Correlation
Coefficient MCC (77.976%). So, a new model was proposed in this research called
GA which is a combination of Gradient Boost and Adaptive Boost by the hybrid
classifier.

الكلمات الرئيسية