Recent Advances in Computer Science and Communications

Author(s): Jaya Lakshmi A, Venkatramaphanikumar S.* and Venkata K. K. Kolli

DOI: 10.2174/2666255813999200904163404

Prediction of Cardiovascular Risk Using Extreme Learning Machine-Tree Classifier on Apache Spark Cluster

Article ID: e180322185585 Pages: 13

  • * (Excluding Mailing and Handling)

Abstract

Background: Currently, Machine Learning (ML) is considered a popular and important area in diverse fields of science and technology, image processing, automobiles, banking, finance, health care sector, etc. The easy availability of data and rapid improvements over machine learning techniques have made it more feasible to understand and to work on various channels of real-time health analytics.

Methods: In this paper, a health status prediction system is proposed to detect cardiovascular diseases through patients’ tweets. Further analytics is carried on a distributed Apache Spark (AS) framework to reduce the time taken for both training and testing when compared with regular standalone machines. Social media streaming data is considered as one of the major sources for data in the proposed system. In this model, attributes of the incoming user tweets are analyzed, and accordingly, cardiovascular risk is predicted, and the latest health status is tweeted back as a reply to the respective user along with a copy to the family and caretakers.

Results: Performance of the proposed framework with Extreme Learning Machine (ELM) - Tree classifier is evaluated on two different corpora. It outperforms other classifiers such as Decision Trees, Naïve Bayes, Linear SVC, DNN, etc. in both accuracy and time.

Conclusion: This proposed study hypothesizes a model for an alert-based system for heart status prediction by adding some additional features impacting the accuracy besides reducing the response time by using Big data Apache Spark Distributed Framework.

Keywords: Machine learning, social media, streaming data, health status, prediction system, apache spark.

Graphical Abstract

[1]
G. Krempl, "Open challenges for data stream mining research", SIGKDD Explor., vol. 16, no. 1, pp. 1-10, 2014.
[http://dx.doi.org/10.1145/2674026.2674028]
[2]
N. Elgendy, and A. Elragal, "Big data analytics: A literature review paper", Adv. Data Mining. Appl. Theor. Asp., vol. 8557, pp. 214-227, 2014.
[http://dx.doi.org/10.1007/978-3-319-08976-8_16]
[3]
C.M. O’Connor, "Social media: Can it reduce heart failure events?", JACC Heart Fail., vol. 4, no. 6, pp. 514-515, June 2016.
[http://dx.doi.org/10.1016/j.jchf.2016.04.006] [PMID: 27256756]
[4]
N.B. Lassen, L. la Cour, and R. Vatrapu, "Predictive Analytics with Social Media Data"In The SAGE Handbook of Social Media Research Methods, 2017, pp. 328-341.
[5]
S. Kumar, F. Morstatter, and H. Liu, Twitter Data Analytics., New York: Springer, 2014.
[6]
L.R. Nair, S.D. Shetty, and S.D. Shetty, "Applying spark based machine learning model on streaming big data for health status prediction", Comput. Electr. Eng., vol. 65, pp. 393-399, Jan 2018.
[http://dx.doi.org/10.1016/j.compeleceng.2017.03.009]
[7]
J.D. Trigo, A. Eguzkiza, M. Martinez-espronceda, and L. Serrano, "A cardiovascular patient follow-up system using Twitter and HL 7"In IEEE Computing in Cardiology, 2013, pp. 33-36.
[8]
T. Nguyen, M.E. Larsen, B. O’Dea, D. Phung, S. Venkatesh, and H. Christensen, "Estimation of the prevalence of adverse drug reactions from social media", Int. J. Med. Inform., vol. 102, pp. 130-137, June 2017.
[http://dx.doi.org/10.1016/j.ijmedinf.2017.03.013] [PMID: 28495341]
[9]
A. Gupta, P. Mody, B. Bikdeli, J. F. Lampropulos, and K. Dharmarajan, "Most important outcomes research papers in cardiovascular disease in the elderly", Circ. Cardiovasc. Qual. Outcomes, vol. 5, no. 3, pp. 17-26, May 2012.
[http://dx.doi.org/10.1161/CIRCOUTCOMES.112.966531] [PMID: 22592757]
[10]
M. Rizwan, W. Wan, O. Cervantes, and L. Gwiazdzinski, "Using location-based social media data to observe check-in behavior and gender difference: Bringing weibo data into play", ISPRS Int. J. Geo-Information, vol. 7, no. 5, p. 196, May 2018.
[http://dx.doi.org/10.3390/ijgi7050196]
[11]
N.L. Muscanell, and R.E. Guadagno, "Make new friends or keep the old: Gender and personality differences in social networking use", Comput. Human Behav., vol. 28, no. 1, pp. 107-112, Jan 2012.
[http://dx.doi.org/10.1016/j.chb.2011.08.016]
[12]
A. Methaila, P. Kansal, H. Arya, and P. Kumar, "Early heart disease prediction using data mining techniques", Comput. Sci. Inf. Technol., vol. 24, pp. 53-59, Aug 2014.
[http://dx.doi.org/10.5121/csit.2014.4807]
[13]
B. Kaur, and W. Singh, "Analysis of heart attack prediction system using genetic algorithm", Int. J. Adv. Technol. Eng. Sci., vol. 3, pp. 87-94, Aug 2015.
[14]
J. Song, S. Lee, and J. Kim, "Spam filtering in Twitter using sender- receiver relationship"In International Workshop on Recent Advances in Intrusion Detection, Springer: Berlin, Heidelberg, 2011, pp. 301-317.
[http://dx.doi.org/10.1007/978-3-642-23644-0_16]
[15]
T. Sakaki, M. Okazaki, and Y. Matsuo, "Tweet analysis for real-time event detection and earthquake reporting system development", IEEE Trans. Knowl. Data Eng., vol. 25, no. 4, pp. 919-931, Feb 2012.
[http://dx.doi.org/10.1109/TKDE.2012.29]
[16]
V.K. Jain, and S. Kumar, "Effective surveillance and predictive mapping of mosquito-borne diseases using social media", J. Comput. Sci., vol. 25, pp. 406-415, Mar 2018.
[17]
C. Khorakhun, and S.N. Bhatti, "Alerts for remote health monitoring using online social media platforms"In 2013 IEEE 15th International Conference on E-Health Networking, Applications and Services, 2013, pp. 177-181.
[http://dx.doi.org/10.1109/HealthCom.2013.6720662]
[18]
A. Gutub, N. Al-Juaid, and E. Khan, "Counting-based secret sharing technique for multimedia applications", Multimedia Tools Appl., vol. 78, no. 5, pp. 5591-5619, Mar 2019.
[http://dx.doi.org/10.1007/s11042-017-5293-6]
[19]
A. Gutub, and K. Alaseri, "Hiding shares of counting-based secret sharing via Arabic text steganography for personal usage", Arab. J. Sci. Eng., vol. 45, no. 4, pp. 2433-2458, Apr 2020.
[20]
S.M. Al-Nofaie, and A.A. Gutub, "Utilizing pseudo-spaces to improve Arabic text steganography for multimedia data communications", Multimedia Tools Appl., vol. 79, no. 1-2, pp. 19-67, Jan 2020.
[http://dx.doi.org/10.1007/s11042-019-08025-x]
[21]
N. Alassaf, A. Gutub, S.A. Parah, and M. Al Ghamdi, "Enhancing speed of SIMON: A light-weight-cryptographic algorithm for IoT applications", Multimedia Tools Appl., vol. 78, no. 23, pp. 32633-32657, Dec 2019.
[http://dx.doi.org/10.1007/s11042-018-6801-z]
[22]
B. Saha, T. Nguyen, D. Phung, and S. Venkatesh, "A framework for classifying online mental health-related communities with an interest in depression", IEEE J. Biomed. Health Inform., vol. 20, no. 4, pp. 1008-1015, Mar 2016.
[http://dx.doi.org/10.1109/JBHI.2016.2543741] [PMID: 27008680]
[23]
A. Abbasi, "Social media analytics for smart health", IEEE Intell. Syst., vol. 29, no. 2, pp. 60-80, June 2014.
[http://dx.doi.org/10.1109/MIS.2014.29]
[24]
A. Gutub, and N. Alharthi, "Improving Hajj and Umrah services utilizing exploratory data visualization techniques", Inf. Vis., vol. 10, pp. 356-371, Oct 2011.
[25]
N. Alharthi, and A. Gutub, "Data visualization to explore improving decision-making within Hajj services", Sci. Model. Res., vol. 2, no. 1, pp. 9-18, June 2017.
[http://dx.doi.org/10.20448/808.2.1.9.18]
[26]
S. Scellato, A. Noulas, R. Lambiotte, and C. Mascolo, "Socio-spatial properties", In 2011 Fifth International AAAI Conference on Weblogs and Social Media, vol. 11, pp. 329-336, 2011.
[27]
M. Rizwan, W. Wan, and L. Gwiazdzinski, "Visualization, spatiotemporal patterns, and directional analysis of urban activities using geolocation data extracted from LBSN", ISPRS Int. J. Geo-Inf., vol. 9, no. 2, p. 137, Feb 2020.
[28]
S. Liu, X. Cheng, F. Li, and F. Li, "TASC: Topic-adaptive sentiment classification on dynamic tweets", IEEE Trans. Knowl. Data Eng., vol. 27, no. 6, pp. 1696-1709, 2015.
[http://dx.doi.org/10.1109/TKDE.2014.2382600]
[29]
A. Verma, A.H. Mansuri, and N. Jain, "Big data management processing with Hadoop MapReduce and spark technology: A comparison", Symp. Colossal Data Anal. Netw., 2016, pp. 1-4.
[http://dx.doi.org/10.1109/CDAN.2016.7570891]
[30]
D. Kelley, "Heart disease : Causes, prevention, and current research", Johnson Cty. Community Coll., vol. 5, no. 2, p. 1, Juy 2014.
[31]
I. Rish, "An empirical study of the naive Bayes classifier", In IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, no. 22, pp. 41-46, 2001.
[32]
R.C. Barros, M.P. Basgalupp, A.C. De Carvalho, and A.A. Freitas, "A survey of evolutionary algorithms for decision-tree induction", IEEE Trans. Syst. Man Cybern. C Appl. Rev., vol. 42, no. 3, pp. 291-312, June 2011.
[33]
G. Biau, "Analysis of a random forests model", J. Mach. Learn. Res., vol. 13, pp. 1063-1095, Apr 2012.
[34]
R. Burbidge, and B. Buxton, "An introduction to support vector machines for data mining"Comput. Sci., 2001, pp. 3-15.
[35]
R. Rifkin, A. Klautau, and K. Org, "In defense of one-vs-all classification", J. Mach. Learn. Res., vol. 5, pp. 101-141, Dec 2004.
[36]
C. Bolton, "Logistic regression and its application in credit scoring", Economics, 2010.
[37]
J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, "DNN-based prediction model for spatial-temporal data"In Proceedings of the 24th ACM SIGSPATIAL International Conference On Advances In Geographic Information Systems, 2016, pp. 1-4.
[38]
J. Nalavade, M. Gavali, N. Gohil, and S. Jamale, "Impelling heart attack prediction system using data mining and artificial neural network", Int. J. Curr. Eng. Technol., vol. 4, no. 3, pp. 1-5, June 2014.
[39]
I. Jenhani, N. Ben Amor, and Z. Elouedi, "Decision trees as possibilistic classifiers", Int. J. Approx. Reason., vol. 48, no. 3, pp. 784-807, Aug 2008.
[http://dx.doi.org/10.1016/j.ijar.2007.12.002]
[40]
N. Joshi, and S. Srivastava, "Improving classification accuracy using ensemble learning technique", Int. J. of Comput. Sci. Mobile Comput., vol. 3, no. 5, pp. 727-732, May 2014.
[41]
S.S. Bucak, P.K. Mallapragada, R. Jin, and A.K. Jain, "Efficient multi-label ranking for multi-class learning: Application to object recognition"In 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 2098-2105.