Abstract
Background: The massive amount of biomedical data accumulated in the past decades can
be utilized for diagnosing disease.
Objective: However, the high dimensionality, small sample sizes, and irrelevant features of data often have
a negative influence on the accuracy and speed of disease prediction. Some existing machine learning
models cannot capture the patterns on these datasets accurately without utilizing feature selection.
Methods: Filter and wrapper are two prevailing feature selection methods. The filter method is fast but
has low prediction accuracy, while the latter can obtain high accuracy but has a formidable computation
cost. Given the drawbacks of using filter or wrapper individually, a novel feature selection method,
called MRMR-EFPATS, is proposed, which hybridizes filter method Minimum Redundancy Maximum
Relevance (MRMR) and wrapper method based on an improved Flower Pollination Algorithm (FPA).
First, MRMR is employed to rank and screen out some important features quickly. These features are
further chosen for individual populations following the wrapper method for faster convergence and less
computational time. Then, due to its efficiency and flexibility, FPA is adopted to further discover an optimal
feature subset.
Results: FPA still has some drawbacks, such as slow convergence rate, inadequacy in terms of searching
new solutions, and tends to be trapped in local optima. In our work, an elite strategy is adopted to
improve the convergence speed of the FPA. Tabu search and Adaptive Gaussian Mutation are employed
to improve the search capability of FPA and escape from local optima. Here, the KNN classifier with
the 5-fold-CV is utilized to evaluate the classification accuracy.
Conclusion: Extensive experimental results on six public high dimensional biomedical datasets show
that the proposed MRMR-EFPATS has achieved superior performance compared to other state-of-theart
methods.
Keywords:
Feature selection, flower pollination algorithm, MRMR, elite strategy, adaptive gaussian mutation, tabu search.
Graphical Abstract
[2]
Liu H, Zhao Z. Manipulating data and dimension reduction methods: Feature selection.Encyclopedia of complexity and systems science. 2009; pp. 5348-9.
[4]
Kira K, Rendell LA. The feature selection problem: Traditional methods and a new algorithm. Tenth National Conf Artificial Intelligence 1992; 129-34.
[12]
Hall MA. Correlation-based feature selection for machine learning PhD Thesis, The University of Waikato: Hamilton, April. 1999.
[15]
Zhai Y, Huang X, Chang X.
[16]
Sun Z, Fan Y, Lelieveldt BPF, et al. Detection of alzheimer’s disease using group lasso svm-based region selectionMedical imaging Computer-aided diagnosis. Bellingham, WA, US: International Society for Optics and Photonics 2015; p. 941414.
[17]
Climente-González H, Azencott CA, Kaski S, Yamada M. Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics 2019; 35(14): i427-35.
[19]
Subanya B, Rajalaxmi RR. Feature selection using Artificial Bee Colony for cardiovascular disease classification International Conference on Electronics and Communication Systems (ICECS). 1-6.
[20]
Subanya B, Rajalaxmi R. A novel feature selection algorithm for heart disease classification. Int J Comput Intell Informatic 2014; 4(2): 117-24.
[25]
Sahu B. A combo feature selection method (filter+ wrapper) for microarray gene classification. Int J Pure Appl Math 2018; 118(16): 389-401.
[28]
El-Shahat D, Abdel-Basset M, El-Henawy I, et al. A modified flower pollination algorithm for the multidimensional knapsack problem: Human-centric decision making. Soft Comput 2017; 22(3): 1-19.
[29]
Ouaar F, Khelil N. Solving initial value problems by flower pollination algorithm. Am J Elec Comput Eng 2018; 2(2): 31-6.
[36]
Ni B, Liu J. A hybrid filter/wrapper gene selection method for microarray classification. Proceedings of 2004 international conference on machine learning and cybernetics. 2004 March 7-11; Hong Kong, China: IEEE 2004.
[37]
Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. Inf Eng Appl 2013; 3(10): 27-38.