Gene Selection Method for Microarray Data Classification Using Particle Swarm Optimization and Neighborhood Rough Set

Page: [422 - 431] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Background: Mining knowledge from microarray data is one of the popular research topics in biomedical informatics. Gene selection is a significant research trend in biomedical data mining, since the accuracy of tumor identification heavily relies on the genes biologically relevant to the identified problems.

Objective: In order to select a small subset of informative genes from numerous genes for tumor identification, various computational intelligence methods were presented. However, due to the high data dimensions, small sample size, and the inherent noise available, many computational methods confront challenges in selecting small gene subset.

Methods: In our study, we propose a novel algorithm PSONRS_KNN for gene selection based on the particle swarm optimization (PSO) algorithm along with the neighborhood rough set (NRS) reduction model and the K-nearest neighborhood (KNN) classifier.

Results: First, the top-ranked candidate genes are obtained by the GainRatioAttributeEval preselection algorithm in WEKA. Then, the minimum possible meaningful set of genes is selected by combining PSO with NRS and KNN classifier.

Conclusion: Experimental results on five microarray gene expression datasets demonstrate that the performance of the proposed method is better than existing state-of-the-art methods in terms of classification accuracy and the number of selected genes.

Keywords: Gene expression data, gene selection, tumor classification, particle swarm optimization, neighborhood rough set, k-nearest neighborhood.

Graphical Abstract

[1]
Shi TW, Kah WS, Mohamad MS, et al. A review of gene selection tools in classifying cancer microarray data. Curr Bioinform 2017; 12(3): 202-12.
[2]
Zeng X, Liu L, Lü L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2018; 34(14): 2425-32.
[3]
Nguyen T, Khosravi A, Creighton D, Nahavandi S. A novel aggregate gene selection method for microarray data classification. Pattern Recognit Lett 2015; 60: 16-23.
[4]
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006; 7(1): 3.
[5]
Zeng X, Liao Y, Liu Y, Zou Q. Prediction and validation of disease genes using HeteSim Scores. IEEE/ACM Trans Comput Biol Bioinformatics 2017; 14(3): 687-95.
[6]
Chen KH, Wang KJ, Tsai ML, et al. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics 2014; 15(1): 49.
[7]
Chen KH, Wang KJ, Wang KM, Angelia MA. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 2014; 24: 773-80.
[8]
Chuang LY, Yang CS, Wu KC, Yang CH. Gene selection and classification using Taguchi chaotic binary particle swarm optimization. Expert Syst Appl 2011; 38(10): 13367-77.
[9]
Dai J, Xu Q. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 2013; 13(1): 211-21.
[10]
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005; 3(2): 185-205.
[11]
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46(1-3): 389-422.
[12]
Shen Q, Shi WM, Kong W. Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data. Comput Biol Chem 2008; 32(1): 52-9.
[13]
Gao L, Ye M, Wu C. Cancer classification based on support vector machine optimized by particle swarm optimization and artificial bee colony. Molecules 2017; 22(12): 2086.
[14]
Kar S, Sharma KD, Maitra M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 2015; 42(1): 612-27.
[15]
Agarwalla P, Mukhopadhyay S. Bi-stage hierarchical selection of pathway genes for cancer progression using a swarm based computational approach. Appl Soft Comput 2018; 62: 230-50.
[16]
Gao L, Ye M, Lu X, Huang D. Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinformatics 2017; 15(6): 389-95.
[17]
Li S, Wu X, Tan M. Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 2008; 12(11): 1039-48.
[18]
Mohamad MS, Omatu S, Deris S, Yoshioka M. A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. IEEE Trans Inf Technol Biomed 2011; 15(6): 813-22.
[19]
Mohamad MS, Omatu S, Deris S, Yoshioka M. Particle swarm optimization for gene selection in classifying cancer classes. Artif Life Robot 2009; 14(1): 16-9.
[20]
Jain I, Jain VK, Jain R. Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 2018; 62: 203-15.
[21]
Pawlak Z. Rough sets. Intl J Comp Inform Sci 1982; 11(5): 341-56.
[22]
Ye M, Wu X, Hu X, Hu D. Multi-level rough set reduction for decision rule mining. Appl Intell 2013; 39(3): 642-58.
[23]
Ye M, Wu X, Hu X, Hu D. Knowledge reduction for decision tables with attribute value taxonomies. Knowl Base Syst 2014; 56: 68-78.
[24]
Ye M, Wu X, Hu X, Hu D. Anonymizing classification data using rough set theory. Knowl Base Syst 2013; 43: 82-94.
[25]
Wang X, Yang J, Teng X, Xia W, Jensen R. Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 2007; 28(4): 459-71.
[26]
Meng J, Zhang J, Li R, Luan Y. Gene selection using rough set based on neighborhood for the analysis of plant stress response. Appl Soft Comput 2014; 25: 51-63.
[27]
Meng J, Zhang J, Luan Y. Gene selection integrated with biological knowledge for plant stress response using neighborhood system and rough set theory. IEEE/ACM Trans Comput Biol Bioinform 2015.12(2): 433-44.
[28]
Zhang SW, Huang DS, Wang SL. A method of tumor classification based on wavelet packet transforms and neighborhood rough set. Comput Biol Med 2010; 40(4): 430-7.
[29]
Yang X, Zhang M, Dou H, Yang J. Neighborhood systems-based rough sets in incomplete information system. Knowl Base Syst 2011; 24(6): 858-67.
[30]
Luo C, Li T, Chen H, Fujita H, Yi Z. Incremental rough set approach for hierarchical multicriteria classification. Inf Sci 2018; 429: 72-87.
[31]
Wang SL, Li X, Zhang S, Gui J, Huang DS. Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Comput Biol Med 2010; 40(2): 179-89.
[32]
Wang Y, Chen X, Jiang W, et al. Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM. Genomics 2011; 98(2): 73-8.
[33]
Nanni L, Brahnam S, Lumini A. Combining multiple approaches for gene microarray classification. Bioinformatics 2012; 28(8): 1151-7.
[34]
Li S, Harner EJ, Adjeroh DA. Random KNN feature selection - a fast and stable alternative to Random Forests. BMC Bioinformatics 2011; 12(1): 450.
[35]
Park CH, Kim SB. Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 2015; 42(5): 2336-42.
[36]
Maji P, Paul S. Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 2011; 52(3): 408-26.
[37]
Inbarani HH, Azar AT, Jothi G. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed 2014; 113(1): 175-85.
[38]
Hu Q, Yu D, Liu J, Wu C. Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 2008; 178(18): 3577-94.
[39]
Maji P, Garai P. On fuzzy-rough attribute selection: criteria of max-dependency, max-relevance, min-redundancy, and max-significance. Appl Soft Comput 2013; 13(9): 3968-80.
[40]
Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018.
[http://dx.doi.org/10.1093/bib/bby107]