Recent Advances in Computer Science and Communications

Author(s): Alisha Banga*, Ravinder Ahuja and Subhash C. Sharma

DOI: 10.2174/2666255813999200628094351

Stacking Regression Algorithms to Predict PM2.5 in the Smart City Using Internet of Things

Article ID: e111022183236 Pages: 17

  • * (Excluding Mailing and Handling)

Abstract

Background: With the increase in populations in urban areas, there is an increase in pollution also. Air pollution is one of the challenging environmental issues in smart cities.

Objective: Real-time monitoring of air quality can help the administration to take appropriate decisions on time. Advancement in the Internet of Things based sensors has changed the way to monitor air quality.

Methods: In this paper, we have applied two-stage regressions. At the first stage, ten regression algorithms (Decision Tree, Random Forest, Elastic Net, Adaboost, Extra Tree, Linear Regression, Lasso, XGBoost, Light GBM, AdaBoost, and Multi-Layer Perceptron) are applied and at second stage best four algorithms are selected and stacking ensemble algorithms are applied using python to predict the PM2.5 pollutants in the air. Dataset of five Chinese cities (Beijing, Chengdu, Guangzhou, Shanghai, and Shenyang) is taken into consideration and compared based on MAE (Mean Absolute Error), RMSE (Root Mean Square Error) and R2 parameters.

Results: We observed that out of ten regression algorithms applied, extra tree algorithm exhibited the best performance on all the five datasets, and further stacking improved the performance.

Conclusion: Feature importance for Sheyang and Beijing city was computed using three regression algorithms, and we found that the four most important features are humidity, wind speed, wind direction and dew point.

Keywords: AQI, regression, deep learning, imputation techniques, PM2.5, machine learning, ensemble learning, IoT, smart city.

Graphical Abstract

[1]
A.R. Abbasi, M.R. Mahmoudi, and Z. Avazzadeh, "Diagnosis and clustering of power transformer winding fault types by cross-correlation and clustering analysis of FRA results", IET Gener. Transm. Distrib., vol. 12, no. 19, pp. 4301-4309, 2018.
[http://dx.doi.org/10.1049/iet-gtd.2018.5812]
[2]
M. Bahrami, M.J. Amiri, M.R. Mahmoudi, and S. Koochaki, "Modeling caffeine adsorption by multi-walled carbon nanotubes using multiple polynomial regression with interaction effects", J. Water Health, vol. 15, no. 4, pp. 526-535, 2017.
[http://dx.doi.org/10.2166/wh.2017.297] [PMID: 28771150]
[3]
Public health, environmental and social determinants of health, 7 Million Premature Deaths Annually Linked to Air Pollution, 2019. [Online]. Available at: https://www.who.int/phe/eNews_63.pdf [Accessed: 17-Nov-2021].
[4]
H. Mayer, "Air pollution in cities", Atmos. Environ., vol. 33, pp. 4029-4037, 1999.
[http://dx.doi.org/10.1016/S1352-2310(99)00144-2]
[5]
J.M. Samet, S.L. Zeger, F. Dominici, F. Curriero, I. Coursac, D.W. Dockery, J. Schwartz, and A. Zanobetti, "The national morbidity, mortality, and air pollution study. Part II: Morbidity and mortality from air pollution in the United States", Res. Rep. Health Eff. Inst, vol. 94, no. Pt 2, pp. 5-70, 2000.
[PMID: 11354823]
[6]
A.H. Al Hanai, D.S. Antkiewicz, J.D. Hemming, M.M. Shafer, A.M. Lai, M. Arhami, V. Hosseini, and J.J. Schauer, "Seasonal variations in the oxidative stress and inflammatory potential of PM2.5 in Tehran using an alveolar macrophage model: The role of chemical composition and sources", Environ. Int., vol. 123, pp. 417-427, 2019.
[http://dx.doi.org/10.1016/j.envint.2018.12.023] [PMID: 30622066]
[7]
F. Laden, J. Schwartz, F.E. Speizer, and D.W. Dockery, "Reduction in fine particulate air pollution and mortality: Extended follow-up of the Harvard Six Cities Study", Am. J. Respir. Crit. Care Med., vol. 173, no. 6, pp. 667-672, 2006.
[8]
J. Evans, A. van Donkelaar, R.V. Martin, R. Burnett, D.G. Rainham, N.J. Birkett, and D. Krewski, "Estimates of global mortality attributable to particulate air pollution using satellite imagery", Environ. Res., vol. 120, pp. 33-42, 2013.
[http://dx.doi.org/10.1016/j.envres.2012.08.005] [PMID: 22959329]
[9]
WHO, Ambient (Outdoor) Air Quality and Health, 2018. [Online]., Available at: https://www.who.int/newsroom/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health [Accessed: 17-Nov-2021].
[10]
G.A. Grell, S.E. Peckham, R. Schmitz, S.A. McKeen, G. Frost, W.C. Skamarock, and B. Eder, "Fully coupled “online” chemistry within the WRF model", Atmos. Environ., vol. 39, pp. 6957-6975, 2005.
[http://dx.doi.org/10.1016/j.atmosenv.2005.04.027]
[11]
L.K. Emmons, S. Walters, P.G. Hess, J.F. Lamarque, G.G. Pfister, D. Fillmore, C. Granier, A. Guenther, D. Kinnison, T. Laepple, J. Orlando, X. Tie, G. Tyndall, C. Wiedinmyer, S.L. Baughcum, and S. Kloster, "Description and evaluation of the model for ozone and related chemical tracers, version 4 (MOZART-4)", Geosci. Model Dev., vol. 3, no. 1, pp. 43-67, 2010.
[http://dx.doi.org/10.5194/gmd-3-43-2010]
[12]
Q. Di, L. Dai, Y. Wang, A. Zanobetti, C. Choirat, J.D. Schwartz, and F. Dominici, "Association of short-term exposure to air pollution with mortality in older adults", JAMA, vol. 318, no. 24, pp. 2446-2456, 2017.
[http://dx.doi.org/10.1001/jama.2017.17923] [PMID: 29279932]
[13]
K. Dimitriou, "Upgrading the estimation of daily PM10 concentrations utilizing prediction variables reflecting atmospheric processes", Aerosol Air Qual. Res., vol. 16, pp. 2245-2254, 2016.
[http://dx.doi.org/10.4209/aaqr.2016.05.0214]
[14]
C. Malalgoda, D. Amaratunga, and R. Haigh, "Local governments and disaster risk reduction: A conceptual framework", In Proceedings of the 6th International Conference on Building Resilience, 2016pp. 699-709
[15]
M.A. Kioumourtzoglou, J.D. Schwartz, M.G. Weisskopf, S.J. Melly, Y. Wang, F. Dominici, and A. Zanobetti, "Long-term PM2:5 exposure and neurological hospital admissions in the northeastern United States", Environ. Health Perspect., vol. 124, no. 1, pp. 23-29, 2016.
[16]
A.J. Cohen, "Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the global burden of diseases study 2015", Lancet, vol. 389, pp. 1907-1918, 2017.
[http://dx.doi.org/10.1016/S0140-6736(17)30505-6]
[17]
M. Asgari, M. Farnaghi, and Z. Ghaemi, "Predictive mapping of urban air pollution using apache spark on a Hadoop cluster", In Proceedings of the 2017 International Conference on Cloud Big Data Computing, 2017pp. 89-93
[http://dx.doi.org/10.1145/3141128.3141131]
[18]
I. Bougoudis, K. Demertzis, and L. Iliadis, "HISYCOL a hybrid computational intelligence system for combined machine learning: The case of air pollution modeling in Athens", Neural Comput. Appl., vol. 27, no. 5, pp. 119-1206, 2016.
[http://dx.doi.org/10.1007/s00521-015-1927-7]
[19]
H. Peng, A.R. Lima, A. Teakles, J. Jin, A.J. Cannon, and W.W. Hsieh, "Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods", Air Qual. Atmos. Health, vol. 10, no. 2, pp. 195-211, 2017.
[http://dx.doi.org/10.1007/s11869-016-0414-3]
[20]
X. Xi, Z. Wei, R. Xiaoguang, W. Yijie, B. Xinxin, Y. Wenjun, and D. Jin, "A comprehensive evaluation of air pollution prediction improvement by a machine learning method", In 2015 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), 2015pp. 176-181
[http://dx.doi.org/10.1109/SOLI.2015.7367615]
[21]
J. Zhang, and W. Ding, "Prediction of air pollutants concentration based on an extreme learning machine: The case of Hong Kong", Int. J. Environ. Res. Public Health, vol. 14, no. 2, p. 114, 2017.
[http://dx.doi.org/10.3390/ijerph14020114] [PMID: 28125034]
[22]
A.B. Ishak, M.B. Daoud, and A. Trabelsi, "Ozone concentration forecasting using statistical learning approaches", J. Mater. Environ. Sci., vol. 8, no. 12, pp. 4532-4543, 2017.
[23]
A. Kurt, and A.B. Oktay, "Forecasting air pollutant indicator levels with geographic models three days in advance using neural networks", Expert Syst. Appl., vol. 37, pp. 7986-7992, 2010.
[http://dx.doi.org/10.1016/j.eswa.2010.05.093]
[24]
D. Zhu, C. Cai, T. Yang, and X. Zhou, "A machine learning approach for air quality prediction: Model regularization and optimization", Big Data and Cognitive Computing, vol. 2, no. 1, p. 5, 2018.
[http://dx.doi.org/10.3390/bdcc2010005]
[25]
M.R. Delavar, A. Gholami, G.R. Shiran, Y. Rashidi, G.R. Nakhaeizadeh, K. Fedra, and S.H. Afshar, "A novel method for improving air pollution prediction based on machine learning approaches: A case study applied to the Capital City of Tehran", ISPRS Int. J. Geoinf., vol. 8, no. 2, p. 99, 2019.
[http://dx.doi.org/10.3390/ijgi8020099]
[26]
G. Corani, "Air quality prediction in Milan: Feed-forward neural networks, pruned neural networks, and lazy learning", Ecol. Modell., vol. 185, pp. 513-529, 2005.
[http://dx.doi.org/10.1016/j.ecolmodel.2005.01.008]
[27]
R.O. Sinnott, and Z. Guan, "Prediction of air pollution through machine learning approaches on the cloud", In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies, 2018pp. 51-60
[http://dx.doi.org/10.1109/BDCAT.2018.00015]
[28]
A. Shamsoddini, M.R. Aboodi, and J. Karami, "Tehran air pollutants prediction based on Random Forest feature selection method", Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch., vol. 42, pp. 483-488, 2017.
[http://dx.doi.org/10.5194/isprs-archives-XLII-4-W4-483-2017]
[29]
H. Kaimian, Q. Li, C. Wu, Y. Qi, Y. Mo, G. Chen, X. Zhang, and S. Sachdeva, "Evaluation of different machine learning approaches to forecasting PM2. 5 mass concentrations", Aerosol Air Qual. Res., vol. 19, no. 6, pp. 1400-1410, 2019.
[http://dx.doi.org/10.4209/aaqr.2018.12.0450]
[30]
J.K. Deters, R. Zalakeviciute, M. Gonzalez, and Y. Rybarczyk, "Modeling PM2:5 urban pollution using machine learning and selected meteorological parameters", J. Electr. Comput. Eng., vol. 2017, pp. 1-14, 2017.
[31]
H. Maleki, A. Sorooshian, G. Goudarzi, Z. Baboli, Y.T. Birgani, and M. Rahmati, "Air pollution prediction by using an artificial neural network model", Clean Technol. Environ. Policy, vol. 21, no. 6, pp. 1341-1352, 2019.
[http://dx.doi.org/10.1007/s10098-019-01709-w]
[32]
X. Liang, S. Li, S. Zhang, H. Huang, and S.X. Chen, "PM2.5 data reliability, consistency, and air quality assessment in five Chinese cities", J. Geophys. Res. Atmos., vol. 121, p. 10220, 2016.
[http://dx.doi.org/10.1002/2016JD024877]
[33]
S.M. Jalali, S. Moro, and M.R. Mahmoudi, "A comparative analysis of classifiers in cancer prediction using multiple data mining techniques", Int. J. Bus. Intell. Syst. Eng., vol. 1, no. 2, pp. 166-178, 2017.
[34]
J.J. Pan, M.R. Mahmoudi, D. Baleanu, and M. Maleki, "On comparing and classifying several independent linear and non-linear regression models with symmetric errors", Symmetry (Basel), vol. 11, no. 6, p. 820, 2019.
[http://dx.doi.org/10.3390/sym11060820]
[35]
M.R. Mahmoudi, "On comparing two dependent linear and nonlinear regression models", J. Test. Eval., vol. 47, no. 1, pp. 449-458, 2018.
[36]
M.R. Mahmoudi, M. Mahmoudi, and A. Pak, "On comparing, classifying and clustering several dependent regression models", J. Stat. Comput. Simul., vol. 89, no. 12, pp. 2280-2292, 2019.
[http://dx.doi.org/10.1080/00949655.2019.1615489]
[37]
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning representations by back-propagating errors", Nature, vol. 323, no. 6088, pp. 533-536, 1986.
[38]
D.C. Montgomery, E.A. Peck, and G.G. Vining, Introduction to Linear Regression Analysis., 5th edJohn Wiley & Sons, Vol. 821,, p. 672, 2012.
[39]
T.K. Ho, "Random decision forests", Proceedings of 3rd IEEE International Conference on Document Analysis and Recognition, vol. 1, pp. 278-282, 1995.
[40]
T. Chen, and C. Guestrin, "Xgboost: A scalable tree boosting system", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016pp. 785-794
[41]
A.J. Smola, and B. Schölkopf, "A tutorial on support vector regression", Stat. Comput., vol. 14, no. 3, pp. 199-222, 2004.
[http://dx.doi.org/10.1023/B:STCO.0000035301.49549.88]
[42]
J.O. Ogutu, T. Schulz-Streeck, and H.P. Piepho, "Genomic selection using regularized linear regression models: Ridge regression, lasso, elastic net and their extensions", In BMC Proc., vol. 6, no. 2, pp. 1-6, 2012.
[http://dx.doi.org/10.1186/1753-6561-6-S2-S10]
[43]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.Y. Liu, "Lightgbm: A highly efficient gradient boosting decision tree", Adv. Neural Inf. Process. Syst., vol. 30, pp. 3146-3154, 2017.
[44]
P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees", Mach. Learn., vol. 63, no. 1, pp. 3-42, 2006.
[http://dx.doi.org/10.1007/s10994-006-6226-1]
[45]
D.P. Solomatine, and D.L. Shrestha, "AdaBoost. RT: A boosting algorithm for regression problems", In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), vol. 2, pp. 1163-1168, 2004.
[http://dx.doi.org/10.1109/IJCNN.2004.1380102]
[46]
L. Breiman, "Stacked regressions", Mach. Learn., vol. 24, no. 1, pp. 49-64, 1996.
[http://dx.doi.org/10.1007/BF00117832]
[47]
J. Miles, "R squared, adjusted R squared", In: Wiley Stats Ref: Statistics Reference Online., 2014. Available at:
[http://dx.doi.org/10.1002/9781118445112.stat06627]
[48]
M.R. Mahmoudi, M. Mahmoudi, and E. Nahavandi, "Testing the difference between two independent regression models", Commun. Stat. Theory Methods, vol. 45, no. 21, pp. 6284-6289, 2016.
[http://dx.doi.org/10.1080/03610926.2014.960584]
[49]
M.R. Mahmoudi, M. Maleki, and A. Pak, "Testing the equality of two independent regression models", Commun. Stat. Theory Methods, vol. 47, no. 12, pp. 2919-2926, 2018.
[http://dx.doi.org/10.1080/03610926.2017.1343847]