Combinatorial Chemistry & High Throughput Screening

Author(s): Saeed Ahmed*, Muhammad Kabir, Zakir Ali, Muhammad Arif, Farman Ali and Dong-Jun Yu

DOI: 10.2174/1386207322666181220124756

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data

Page: [631 - 645] Pages: 15

  • * (Excluding Mailing and Handling)

Abstract

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance.

Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test.

Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods.

Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

Keywords: Cancer classification, gene expression data, correlation-based feature selection, multi-objective evolutionary algorithm, redial base function neural network.