Abstract
Background: There have been rapid developments in various bioinformatics
technologies, which have led to the accumulation of a large amount of biomedical data. However,
these datasets usually involve thousands of features and include much irrelevant or redundant
information, which leads to confusion during diagnosis. Feature selection is a solution that consists
of finding the optimal subset, which is known to be an NP problem because of the large search
space.
Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved
chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which
called IGICRO.
Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is
combined with ICRO to increase the diversity of the population and improve the capacity of local
search.
Results: Experimental results of eight public available data sets demonstrate that our proposed
approach outperforms original CRO and other state-of-the-art approaches.
Keywords:
Feature selection, chemical reaction optimization algorithm (CRO), information gain, neighborhood search
mechanism, biomedical data, optimal subset.
Graphical Abstract
[11]
Ahmad W, Huang L, Ahmad A, et al. Thyroid diseases forecasting using a hybrid decision support system based on ANFIS, k-NN and information gain method. J Appl Environ Biol Sci 2017; 7: 78-85.
[19]
Nayak J, Paparao S, Naik B, et al. Chemical reaction optimization: a survey with application and challenges soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
[20]
Doshi J, Chindhe M, Kharche Y, et al. Simultaneous gene selection and cancer classification using chemical reaction optimization. Proceedings of the World Congress on Engineering London: Springer; 2014.
[23]
Babatunde OH, Armstrong L, Leng J, et al. A genetic algorithm-based feature selection. Int J Electr Commun Comp Eng 2014; 2014: 2278-4209.
[25]
Hu B, Dai YQ, Su Y, et al. Feature selection for optimized high dimensional biomedical data using the improved shuffled frog leaping algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2018; 15(6): 1765-73.
[30]
Yang CH, Chuang LY, Yang CHIG-GA. a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 2010; 30(1): 23-8.
[31]
Sahu B. A combo feature selection method (Filter+Wrapper) for microarray gene classification. Int J Pure Appl Math 2018; 118(16): 389-401.
[35]
Grube GW, Markison TW. Encoding data utilizing a zero information gain function. US Patent 20190138393A1, 2014.
[38]
Nayak J, Paparao S, Naik B, et al. Soft Computing in Data Analytics.Chemical reaction optimization: a survey with application and challenges proceedings of soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
[41]
Ghanad NK, Ahmadi S. Combination of PSO algorithm and naive Bayesian classification for Parkinson disease diagnosis. Adv Comp Sci: An Int J 2015; 4(4): 119-25.