Current Bioinformatics

Author(s): Ge Zhang, Pan Yu , Jianlin Wang* and Chaokun Yan*

DOI: 10.2174/1574893615666200204154358

DownloadDownload PDF Flyer Cite As
Feature Selection Algorithm for High-dimensional Biomedical Data Using Information Gain and Improved Chemical Reaction Optimization

Page: [912 - 926] Pages: 15

  • * (Excluding Mailing and Handling)


Background: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space.

Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO.

Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search.

Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.

Keywords: Feature selection, chemical reaction optimization algorithm (CRO), information gain, neighborhood search mechanism, biomedical data, optimal subset.

Graphical Abstract

Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics 2018; 34(11): 1904-12.
[] [PMID: 29365057]
Luo H, Wang J, Li M, et al. Computational drug repositioning with random walk on a heterogeneous network. IEEE/ACM Trans Comput Biol Bioinformatics 2019; 16(6): 1890-900.
[] [PMID: 29994051]
Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015; 2015198363
[] [PMID: 26170834]
Yan C, Ma J, Luo H, et al. A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci Technol 2018; 23(6): 733-43.
Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognit Lett 2007; 28(13): 1825-44.
Hsu HH, Hsieh CW, Lu MD. Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 2011; 38(7): 8144-50.
Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, et al. Using information gain to improve multi-modal information retrieval systems. Inf Process Manage 2008; 44(3): 1146-58.
Li YH. Text feature selection algorithm based on chi-square rank correlation factorization. J Interdiscip Math 2017; 20(1): 153-60.
Sharmila A, Geethanjali P. Evaluation of time domain features using best feature subsets based on mutual information for detecting epilepsy. J Med Eng Technol 2018; 42(7): 487-500.
[] [PMID: 30875262]
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: introduction and review. J Biomed Inform 2018; 85: 189-203.
[] [PMID: 30031057]
Ahmad W, Huang L, Ahmad A, et al. Thyroid diseases forecasting using a hybrid decision support system based on ANFIS, k-NN and information gain method. J Appl Environ Biol Sci 2017; 7: 78-85.
Ding J, Fu L. A hybrid feature selection algorithm based on information gain and sequential forward floating search. J Intel Comp Vol 2018; 9: 93-101.
Chuang LY, Ke CH, Yang CH. A hybrid both filter and wrapper feature selection method for microarray classification. arXiv preprint arXiv 2016.
Verbiest N, Derrac J, Cornelis C, et al. Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 2016; 38: 10-22.
Mafarja MM, Mirjalili S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017; 260: 302-12.
Ao HL, Cheng J, Yang Y, et al. The support vector machine parameter optimization method based on artificial chemical reaction optimization algorithm and its application to roller bearing fault diagnosis. J Vib Control 2015; 21(12): 2434-45.
Truong TK, Li K, Xu Y, et al. Solving 0-1 knapsack problem by artificial chemical reaction optimization algorithm with a greedy strategy. J Intell Fuzzy Syst 2015; 28(5): 2179-86.
Marzouki B, Driss OB, Ghédira K. Multi-agent model based on combination of chemical reaction optimisation metaheuristic with tabu search for flexible job shop scheduling problem. Int J Intel Eng Inform 2018; 6(3-4): 242-65.
Nayak J, Paparao S, Naik B, et al. Chemical reaction optimization: a survey with application and challenges soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
Doshi J, Chindhe M, Kharche Y, et al. Simultaneous gene selection and cancer classification using chemical reaction optimization. Proceedings of the World Congress on Engineering London: Springer; 2014.
Lam AYS, Li VOK, Xu J. On the convergence of chemical reaction optimization for combinatorial optimization. IEEE Trans Evol Comput 2012; 17(5): 605-20.
Salcedo-Sanz S, Pastor-Sánchez A, Prieto L, et al. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization–extreme learning machine approach. Energy Convers Manage 2014; 87: 10-8.
Babatunde OH, Armstrong L, Leng J, et al. A genetic algorithm-based feature selection. Int J Electr Commun Comp Eng 2014; 2014: 2278-4209.
Yusuf M, Lansey K, Pasha F. Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 2006; 38(2): 129-54.
Hu B, Dai YQ, Su Y, et al. Feature selection for optimized high dimensional biomedical data using the improved shuffled frog leaping algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2018; 15(6): 1765-73.
Chuang LY, Chang HW, Tu CJ, Yang CH. Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 2008; 32(1): 29-37.
[] [PMID: 18023261]
Fong S, Deb S, Hanne T, et al. Eidetic wolf search algorithm with a global memory structure. Eur J Oper Res 2016; 254(1): 19-28.
Li J, Fong S, Wong RK, Millham R, Wong KKL. Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci Rep 2017; 7(1): 4354.
[] [PMID: 28659577]
Yan C, Ma J, Luo H, et al. Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 2019; 184: 102-11.
Yang CH, Chuang LY, Yang CHIG-GA. a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 2010; 30(1): 23-8.
Sahu B. A combo feature selection method (Filter+Wrapper) for microarray gene classification. Int J Pure Appl Math 2018; 118(16): 389-401.
Liu Y, Yi X, Chen R, et al. Feature extraction based on information gain and sequential pattern for english question classification. IET Softw 2018; 12(6): 520-6.
Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 2018; 69: 541-53.
Lai CM, Yeh WC, Chang CY. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 2016; 218: 331-8.
Grube GW, Markison TW. Encoding data utilizing a zero information gain function. US Patent 20190138393A1, 2014.
Lei S. A feature selection method based on information gain and genetic algorithm. Proceedings of International Conference on Computer Science and Electronics Engineering March 23-25 Hangzhou, China IEEE 2012.
Alatas B. ACROA: artificial chemical reaction optimization algorithm for global optimization. Expert Syst Appl 2011; 38(10): 13170-80.
Nayak J, Paparao S, Naik B, et al. Soft Computing in Data Analytics.Chemical reaction optimization: a survey with application and challenges proceedings of soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
Jarboui B, Derbel H, Hanafi S, et al. Variable neighborhood search for location routing. Comput Oper Res 2013; 40(1): 47-57.
Vieira SM, Mendonça LF, Farinha GJ, et al. Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 2013; 13(8): 3494-504.
Ghanad NK, Ahmadi S. Combination of PSO algorithm and naive Bayesian classification for Parkinson disease diagnosis. Adv Comp Sci: An Int J 2015; 4(4): 119-25.
Sayed S, Nassef M, Badr A, et al. A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets. Expert Syst Appl 2019; 121: 233-43.
Hancer E, Xue B, Zhang M, et al. Pareto front feature selection based on artificial bee colony optimization. Inf Sci 2018; 422: 462-79.
Mafarja M, Mirjalili S. Whale optimization approaches for wrapper feature selection. Appl Soft Comput 2018; 62: 441-53.
Zhu Z, Ong YS, Dash M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 2007; 40(11): 3236-48.
Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 2017; 56: 94-106.
Pashaei E, Pashaei E, Aydin N. Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 2019; 111(4): 669-86.
[] [PMID: 29660477]
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005; 3(2): 185-205.
[] [PMID: 15852500]
Huang TM, Kecman V. Gene extraction for cancer diagnosis by support vector machines--an improvement. Artif Intell Med 2005; 35(1-2): 185-94.
[] [PMID: 16026974]
Safran M, Dalah I, Alexander J, et al. GeneCards Version 3:the human gene integrator. Database
Park DK, Jung EY, Lee SH, et al. A composite gene selection for DNA microarray data analysis. Multimedia Tools Appl 2015; 74(20): 9031-41.
Aguilar-Ruiz JS, Azuaje F, Riquelme JC. Data mining approaches to diffuse large B– Cell Lymphoma gene expression data interpretation. International Conference on Data Warehousing and Knowledge Discovery Berlin: Springer 2004.
Zhou X, Liu KY, Wong STC. Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inform 2004; 37(4): 249-59.
[] [PMID: 15465478]