Improving the accuracy of Stock Price Prediction: A Comparative Study of Statistical models and Machine Learning Algorithms with application to Panel Data

نوع المستند : المقالة الأصلية

المؤلفون

1 Faculty of Commerce, Mansoura University

2 Higher Institute of Administrative Sciences, El-Menzala, Egypt

3 Higher institute of Administrative science El-Menzalah Egypt

10.21608/cfdj.2025.374527.2226

المستخلص

This study conducts a comparative analysis of random effects panel models and random forest algorithms in stock price prediction, exploring the trade-off between econometric interpretability and machine learning's predictive performance. The research contributes to ongoing discussions regarding financial modeling strategies that balance accuracy with explanatory power.
Using panel data from stock markets, we apply two methodological approaches: (1) random effects models to account for unobserved heterogeneity, and (2) random forests to capture complex nonlinear patterns. Model performance is evaluated using mean squared error (MSE) and R² metrics, alongside assessments of computational efficiency, data requirements, and interpretability.
The results indicate that random forests achieve marginally superior predictive accuracy in certain scenarios, whereas random effects models retain advantages in interpretability and robustness, particularly in modeling heterogeneity. These findings highlight the inherent tension between predictive power and transparency in financial analytics.
The study demonstrates that random effects models remain a valuable tool for stock price prediction, despite the slight accuracy gains offered by machine learning techniques. Each approach exhibits distinct strengths: statistical models provide clearer economic insights, while algorithmic methods excel in predictive performance.
For researchers and practitioners, we propose a selection framework based on analytical priorities. When interpretability is critical, random effects models are preferable. Conversely, when maximizing predictive accuracy is the primary objective—and sufficient computational resources are available—random forests may be more suitable. The optimal choice depends on research objectives and dataset characteristics, with our findings offering empirically grounded guidance.

الكلمات الرئيسية