Background: Railway systems all over the world face an uphill task in preventing train delays. Categorically in India, the situation is far worse than other developing countries due to the high number of passengers and poor update of the previous system. As per a report in Times of India (TOI), a daily newspaper, around 25.3 million people used to travel by train in 2006 which drastically increased year on year to 80 million in 2018.
Objective: Deploy Machine Learning model to predict the delay in arrival of train(s) in minutes, before starting the journey on a valid date.
Methods: In this paper we combined previous train delay data and weather data to predict delay. In the proposed model, we use 4 different machine learning methods (Linear regression, Gradient Boosting Regression, Decision Tree and Random Forest) which have been compared with different settings to find the most accurate method.
Results: Linear Regression gives 90.01% accuracy, while Gradient Boosting Regressor measure 91.68% and the most accurate configuration of decision tree give 93.71% accuracy. When the researcher implemented the ensemble method, Random forest regression, the researcher achieved 95.36% accuracy.
Conclusion: Trains in India get delayed frequently. This model would assist the Indian railways and concerned companies by giving the possibility of finding frequent delays during certain times of the week. The Indian railways could thereafter implement delay preventions during these particular times of the week in order to maintain a good on-time arrival rate.
Keywords: Train delay, linear regression, GBR, decision tree, random forest, algorithm.