Abstract
Background: With the increase in populations in urban areas, there is an increase in
pollution also. Air pollution is one of the challenging environmental issues in smart cities.
Objective: Real-time monitoring of air quality can help the administration to take appropriate decisions
on time. Advancement in the Internet of Things based sensors has changed the way to monitor
air quality.
Methods: In this paper, we have applied two-stage regressions. At the first stage, ten regression
algorithms (Decision Tree, Random Forest, Elastic Net, Adaboost, Extra Tree, Linear Regression,
Lasso, XGBoost, Light GBM, AdaBoost, and Multi-Layer Perceptron) are applied and at second
stage best four algorithms are selected and stacking ensemble algorithms are applied using python
to predict the PM2.5 pollutants in the air. Dataset of five Chinese cities (Beijing, Chengdu, Guangzhou,
Shanghai, and Shenyang) is taken into consideration and compared based on MAE (Mean
Absolute Error), RMSE (Root Mean Square Error) and R2 parameters.
Results: We observed that out of ten regression algorithms applied, extra tree algorithm exhibited
the best performance on all the five datasets, and further stacking improved the performance.
Conclusion: Feature importance for Sheyang and Beijing city was computed using three regression
algorithms, and we found that the four most important features are humidity, wind speed, wind
direction and dew point.
Keywords:
AQI, regression, deep learning, imputation techniques, PM2.5, machine learning, ensemble learning, IoT, smart city.
Graphical Abstract
[7]
F. Laden, J. Schwartz, F.E. Speizer, and D.W. Dockery, "Reduction in fine particulate air pollution and mortality: Extended follow-up of the Harvard Six Cities Study", Am. J. Respir. Crit. Care Med., vol. 173, no. 6, pp. 667-672, 2006.
[14]
C. Malalgoda, D. Amaratunga, and R. Haigh, "Local governments and disaster risk reduction: A conceptual framework", In Proceedings of the 6th International Conference on Building Resilience, 2016pp. 699-709
[15]
M.A. Kioumourtzoglou, J.D. Schwartz, M.G. Weisskopf, S.J. Melly, Y. Wang, F. Dominici, and A. Zanobetti, "Long-term PM2:5 exposure and neurological hospital admissions in the northeastern United States", Environ. Health Perspect., vol. 124, no. 1, pp. 23-29, 2016.
[22]
A.B. Ishak, M.B. Daoud, and A. Trabelsi, "Ozone concentration forecasting using statistical learning approaches", J. Mater. Environ. Sci., vol. 8, no. 12, pp. 4532-4543, 2017.
[30]
J.K. Deters, R. Zalakeviciute, M. Gonzalez, and Y. Rybarczyk, "Modeling PM2:5 urban pollution using machine learning and selected meteorological parameters", J. Electr. Comput. Eng., vol. 2017, pp. 1-14, 2017.
[33]
S.M. Jalali, S. Moro, and M.R. Mahmoudi, "A comparative analysis of classifiers in cancer prediction using multiple data mining techniques", Int. J. Bus. Intell. Syst. Eng., vol. 1, no. 2, pp. 166-178, 2017.
[35]
M.R. Mahmoudi, "On comparing two dependent linear and nonlinear regression models", J. Test. Eval., vol. 47, no. 1, pp. 449-458, 2018.
[37]
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning representations by back-propagating errors", Nature, vol. 323, no. 6088, pp. 533-536, 1986.
[38]
D.C. Montgomery, E.A. Peck, and G.G. Vining, Introduction to Linear Regression Analysis., 5th edJohn Wiley & Sons, Vol. 821,, p. 672, 2012.
[39]
T.K. Ho, "Random decision forests", Proceedings of 3rd IEEE International Conference on Document Analysis and Recognition, vol. 1, pp. 278-282, 1995.
[40]
T. Chen, and C. Guestrin, "Xgboost: A scalable tree boosting system", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016pp. 785-794
[43]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.Y. Liu, "Lightgbm: A highly efficient gradient boosting decision tree", Adv. Neural Inf. Process. Syst., vol. 30, pp. 3146-3154, 2017.