Data Analysis and Mapping of Research Interest in Clinical Trials of Tuberculosis by Text Mining Platform of Artificial Intelligence using Open-Source Tool Orange Canvas

Article ID: e130122200200 Pages: 15

  • * (Excluding Mailing and Handling)

Abstract

Background: Reading every clinical trial for any disease is tedious, as is determining the current progress, especially when the number of clinical trials is huge. The Text Mining Platform of Artificial Intelligence (AI) can help to simplify the task.

Methods: A large pool of tuberculosis clinical trials has been searched through the International Clinical Trial Registry Platform (ICTRP) and used as a textual dataset. The exported dataset of 1635 clinical studies, in a comma-separated format, is preprocessed for data analysis and text mining. Data preparation, corpus generation, text preprocessing, and finally, cluster analysis were carried out using the textmining widget of the open-source machine learning tool. The hierarchical cluster analysis was used for mapping research interests in tuberculosis clinical trials.

Conclusion: The data mining of the exported dataset of tuberculosis clinical trials uncovered interesting facts in terms of numbers. Text mining presented a total of 41 hierarchical clusters that were further mapped in twenty-five (25) different research interests among tuberculosis clinical trials. A novel technique for the rapid and practical review of major clinical trials is demonstrated. As an open-source and GUI-based tool is used for work, any researcher with working knowledge of text mining may also use this technique for other clinical trials.

Keywords: Text mining, data analysis, hierarchical cluster analysis, tuberculosis, clinical trials, ICTRP, AI.

Graphical Abstract

[1]
Barberis, I.; Bragazzi, N.L.; Galluzzo, L.; Martini, M. The history of tuberculosis: from the first historical records to the isolation of Koch’s bacillus. J. Prev. Med. Hyg., 2017, 58(1), E9-E12.
[2]
Global Tuberculosis Report; World Health Organization: Geneva, 2019. Available from: https://www.who.int/publications/i/item/9789241565714
[3]
Migliori, G.B.; Tiberi, S.; Zumla, A.; Petersen, E.; Chakaya, J.M.; Wejse, C.; Muñoz Torrico, M.; Duarte, R.; Alffenaar, J.W.; Schaaf, H.S.; Marais, B.J.; Cirillo, D.M.; Alagna, R.; Rendon, A.; Pontali, E.; Piubello, A.; Figueroa, J.; Ferlazzo, G.; García-Basteiro, A.; Centis, R.; Visca, D.; D’Ambrosio, L.; Sotgiu, G. MDR/XDR-TB management of patients and contacts: Challenges facing the new decade. The 2020 clinical update by the global tuberculosis network. Int. J. Infect. Dis., 2020, 92S, S15-S25.
[http://dx.doi.org/10.1016/j.ijid.2020.01.042] [PMID: 32032752]
[4]
Young, M.; Craig, J. Urgent global action is needed on multi drug-resistant tuberculosis (MDR-TB)–can small cone moxa contribute to a global response? Eur. J. Integr. Med., 2020, 37, 101072.
[http://dx.doi.org/10.1016/j.eujim.2020.101072]
[5]
Li, B.Y.; Shi, W.P.; Zhou, C.M.; Qi, Z.; Vinod, K.D.; Xu, B.Z.; Yang, L.; Sven, H.; Biao, X. Rising challenge of multidrug-resistant tuberculosis in China: a predictive study using Markov modeling. Infect. Dis. Poverty, 2020, 9, 65.
[http://dx.doi.org/10.1186/s40249-020-00682-7]
[6]
Guglielmetti, L.; Low, M.; McKenna, L. Challenges in TB regimen development: preserving evidentiary standards for regulatory decisions and policymaking; Taylor & Francis, 2020.
[7]
Korhonen, A.; Séaghdha, D.O.; Silins, I.; Sun, L.; Högberg, J.; Stenius, U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One, 2012, 7(4), e33427.
[http://dx.doi.org/10.1371/journal.pone.0033427] [PMID: 22511921]
[8]
Fabbri, S.; Elis, H.; Andre, D.T.; Anderson, B.; Augusto, Z.; Cleiton, S. Using information visualization and text mining to facilitate the conduction of systematic literature reviews. International Conference on Enterprise Information Systems, Springer Berlin, Heidelberg. 2012, pp. 243-256.
[9]
Rodriguez-Esteban, R.; Bundschus, M. Text mining patents for biomedical knowledge. Drug Discov. Today, 2016, 21(6), 997-1002.
[http://dx.doi.org/10.1016/j.drudis.2016.05.002] [PMID: 27179985]
[10]
Przybyła, P. Text mining resources for the life sciences. Database (Oxford), 2016.
[http://dx.doi.org/10.1093/database/baw145]
[11]
Zhu, F.; Patumcharoenpol, P.; Zhang, C.; Yang, Y.; Chan, J.; Meechai, A.; Vongsangnak, W.; Shen, B. Biomedical text mining and its applications in cancer research. J. Biomed. Inform., 2013, 46(2), 200-211.
[http://dx.doi.org/10.1016/j.jbi.2012.10.007] [PMID: 23159498]
[12]
Fleuren, W.W.M.; Alkema, W. Application of text mining in the biomedical domain. Methods, 2015, 74, 97-106.
[http://dx.doi.org/10.1016/j.ymeth.2015.01.015] [PMID: 25641519]
[13]
Saffer, J.D.; Burnett, V.L. Introduction to Biomedical Literature Text Mining: Context and Objectives.Biomedical Literature Mining; Kumar, V.D.; Tipney, H.J., Eds.; Springer New York: New York, NY, 2014, pp. 1-7.
[http://dx.doi.org/10.1007/978-1-4939-0709-0_1]
[14]
Simon, C.; Davidsen, K.; Hansen, C.; Seymour, E.; Barnkob, M.B.; Olsen, L.R. BioReader: a text mining tool for performing classification of biomedical literature. BMC Bioinformatics, 2019, 19(Suppl. 13), 57.
[http://dx.doi.org/10.1186/s12859-019-2607-x] [PMID: 30717659]
[15]
Senger, S. Assessment of the significance of patent-derived information for the early identification of compound-target interaction hypotheses. J. Cheminform., 2017, 9(1), 26.
[http://dx.doi.org/10.1186/s13321-017-0214-2] [PMID: 29086108]
[16]
Korkontzelos, I. Text mining for efficient search and assisted creation of clinical trials. Proceedings of the ACM 5th international workshop on Data and text mining in biomedical informatics, 2011, pp. 43-50.
[http://dx.doi.org/10.1145/2064696.2064706]
[17]
Demšar, J. Orange: data mining toolbox in Python. J. Mach. Learn. Res., 2013, 14(1), 2349-2353.
[18]
Jovic, A.; Brkic, K.; Bogunovic, N. An overview of free software tools for general data mining. 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014, p. 1112-1117.
[http://dx.doi.org/10.1109/MIPRO.2014.6859735]
[19]
Kaur, M.P.; Rana, Q.P. Advances in Agri-Management. In: OrangeFuture Challenges ; , 2017; p. 155.
[20]
Demšar, J.; Zupan, B. Orange: Data mining fruitful and fun-a historical perspective. In: Informatica (Vilnius); , 2013; 37, . (1)
[21]
Ghosal, A. A short review on different clustering techniques and their applications.Emerging Technology in Modelling and Graphics; Jyotsna, K.M.; Debika, B., Eds.; Springer, 2020, pp. 69-83.
[http://dx.doi.org/10.1007/978-981-13-7403-6_9]
[22]
Zou, H.J.W.P.C. Clustering algorithm and its application in data mining. Wirel. Pers. Commun., 2020, 110(1), 21-30.
[23]
Demšar, J. Orange: data mining toolbox in Python. J. Mach. Learn. Res., 2013, 14(1), 2349-2353.
[24]
Kubek, M. Natural Language Processing and Text Mining; Springer, 2020.
[http://dx.doi.org/10.1007/978-3-030-23136-1_4]
[25]
Rokach, L.; Maimon, O. Clustering Methods.Data Mining and Knowledge Discovery Handbook; Maimon, O.; Rokach, L., Eds.; Springer US: Boston, MA, 2005, pp. 321-352.
[http://dx.doi.org/10.1007/0-387-25465-X_15]
[26]
El-Hamdouchi, A.; Willett, P. Hierarchic document classification using Ward’s clustering method. Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, Association for Computing Machinery New York, NY, USA. 1986, pp. 149-156.
[http://dx.doi.org/10.1145/253168.253200]
[27]
Piller, C. FDA and NIH let clinical trial sponsors keep results secret and break the law, 2020. Available from: https://www.sciencemag.org/news/2020/01/fda-and-nih-let-clinical-trial-sponsors-keep-results-secret-and-break-law
[http://dx.doi.org/10.1126/science.aba8123]
[28]
Fleming, N. Top US institutes still aren’t reporting clinical-trial results on time., 2019. Available from: https://www.nature.com/articles/d41586-019-00994-1
[http://dx.doi.org/10.1038/d41586-019-00994-1]
[29]
Piller, C. Transparency on trial. Science, 2020, 367(6475), 240-243.
[http://dx.doi.org/10.1126/science.367.6475.240] [PMID: 31949063]