Effect of Grid Search and Hyper Parameter Tuned Pipeline with Various Classifiers and PCA for Breast Cancer Detection

Article ID: e150722206811 Pages: 12

  • * (Excluding Mailing and Handling)

Abstract

Background: The study of breast cancer detection begins with the WBCD dataset for most researchers, as it is a very well-known dataset. We use this dataset as a benchmark in this paper to study ML algorithms like SVM, DT, RF, KNN, NB classifiers, Logistic Regression, Extra Trees, Bagging Classifiers with hard and soft voting, Ensemble techniques and Extreme Gradient Boosting classifiers like XG Boost and 2 deep learning models with regularization and without regularization.

Objective: The primary objective is to revisit how the existing classifiers fare with the WBCD dataset and suggest a method with Grid search and Randomized search by selecting the best hyper parameters to apply with and without PCA and check if the WBCD dataset can be classified in lesser time without compromising accuracy.

Methods: We explore PCA as a feature extraction technique in this dataset and use techniques like Feature Scaling K Fold stratified cross-validation technique, K best etc. We implement Grid search CV along with PCA in the pipeline to tune the hyper parameters across various classifiers and reduce the training and prediction time without compromising accuracy. Last but not least, this paper also compares the accuracy, precision and recall of various ML techniques for manually selected features by observing the feature importance score and the correlation matrix.

Results: In our experiment with all features, we get an accuracy of 97.9 percent for Extra trees and Ensemble techniques with RF, KNN and Extra Trees with soft voting strategy and using feature selection with PCA and grid search, we get an accuracy of 99.1 percent with SVM (kernel trick). We also demonstrate that the running time of training and prediction also reduces if hyper parameters of classifiers are tuned appropriately, which is taken care of by Grid and Randomized Hyper Parameter Grids.

Conclusion: It is shown in this paper that Feature subset selection or feature ranking may not be the best way and not the only way to be applied to the WBCD dataset along with PCA. In datasets where features are closely correlated , a method for hyper parameter tuning using either Grid or Randomized Search can be accompanied with PCA to extract the best feature combinations and then fed into the classifiers to get good accuracy scores and can be executed in a much quicker time.

Keywords: WBCD, Breast Cancer Detection, PCA, Grid Search CV, Randomized Search, Cross-Validation, Regularization, Machine Learning, Deep Learning

Graphical Abstract

[1]
Chiu H-J, Li T-HS, Kuo P-H. Breast cancer–detection system using PCA, multilayer perceptron, transfer learning, and support vector machine. IEEE Access 2020; 8: 204309-4.
[http://dx.doi.org/10.1109/ACCESS.2020.3036912]
[2]
Zhang Y, Shi R, Chen C, Duan M, Liu S, Ren Y. ELMO: An efficient logistic regression-based multi-omic integrated analysis method for breast cancer intrinsic subtypes IEEE Access 2020; 8: 5121-30.
[http://dx.doi.org/10.1109/ACCESS.2019.2960373]
[3]
Wang Z, Li M, Wang H, Jiang H, Yao Y, Zhang H. Breast cancer detection using extreme learning machine based on feature fusion with CNN deep features. IEEE Access 2019; 7: 105146-58.
[http://dx.doi.org/10.1109/ACCESS.2019.2892795]
[4]
Jhajharia S, Varshney HK, Verma S, Kumar R. A neural network based breast cancer prognosis model with PCA processed features 2016 Int Conf Adv Comput Commun Informatics. ICACCI 2016; 1896-901.
[http://dx.doi.org/10.1109/ICACCI.2016.7732327]
[5]
Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A deep learning mammography-based model for improved breast cancer risk pre-diction. Radiology 2019; 292(1): 60-6.
[http://dx.doi.org/10.1148/radiol.2019182716] [PMID: 31063083]
[6]
Geweid GN, Abdallah MA. A novel approach for breast cancer investigation and recognition using M-level set-based optimization functions. IEEE Access 2019; 7: 136343-57.
[http://dx.doi.org/10.1109/ACCESS.2019.2941990]
[7]
Li X, Radulovic M, Kanjer K, Plataniotis KN. Discriminative pattern mining for breast cancer histopathology image classification via fully convolutional autoencoder. IEEE Access 2019; 7: 36433-45.
[http://dx.doi.org/10.1109/ACCESS.2019.2904245]
[8]
Mushtaq Z, Yaqub A, Hassan A, Su SF.
[http://dx.doi.org/10.1109/CEET1.2019.8711868]
[9]
Mahmood T, Li J, Pei Y, Akhtar F, Imran A, Rehman KU. A brief survey on breast cancer diagnostic with deep learning schemes using multi-image modalities. IEEE Access 2020; 8: 165779-809.
[http://dx.doi.org/10.1109/ACCESS.2020.3021343]
[10]
Yadav A, Jamir I, Jain RR, Sohani M. Breast cancer prediction using SVM with PCA feature selection metho. International J Sci Res Com-put Sci Eng Inf Technol 2019; 969-78.
[http://dx.doi.org/10.32628/CSEIT1952277]
[11]
Nanda S, Sukumar M. Analysis of the performance of classifiers on wavelet features with PCA and GA for the detection of breast cance. Ultrasound Images, IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) 2020; 10(6): 1-9.
[12]
Abdikenov B, Iklassov Z, Sharipov A, Hussain S, Jamwal PK. Analytics of heterogeneous breast cancer data using neuroevolution. IEEE Access 2019; 7: 18050-60.
[http://dx.doi.org/10.1109/ACCESS.2019.2897078]
[13]
Yang L, Xu Z. Feature extraction by PCA and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cybern 2019; 10(3): 591-601.
[http://dx.doi.org/10.1007/s13042-017-0741-1]
[14]
Kriti JV, Dey N, Kumar V. PCA-PNN and PCA-SVM based cad systems for breast density classification. Intell Syst Ref Libr 2016; 96: 159-80.
[http://dx.doi.org/10.1007/978-3-319-21212-8_7]
[15]
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L. A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inform 2017; 34(4): 133-44.
[http://dx.doi.org/10.1016/j.tele.2017.01.007]
[16]
Aldhaeebi MA, Almoneef TS, Attia H, Ramahi OM. Electrically small magnetic probe with PCA for near-field microwave breast tumors detection. Prog Electromagn Res M Pier M 2019; 84(August): 177-86.
[http://dx.doi.org/10.2528/PIERM19061303]
[17]
Renjith VS, Hency Jose PS. A noninvasive approach using multi-tier deep learning classifier for the detection and classification of breast neoplasm based on the staging of tumor growth 2020 International Conference on Decision Aid Sciences and Application (DASA) 12-6.
[http://dx.doi.org/10.1109/DASA51403.2020.9317038]
[18]
Haq AU. Detection of breast cancer through clinical data using supervised and unsupervised feature selection techniques. IEEE Access 2021; 9: 22090-105.
[http://dx.doi.org/10.1109/ACCESS.2021.3055806]
[19]
Jamal A, Handayani A, Septiandri AA, Ripmiatin E, Effendi Y. Dimensionality reduction using PCA and K-means clustering for breast cancer prediction. Lontar Komput J Ilm Teknol Inf 2018; 2019(January): 192.
[http://dx.doi.org/10.24843/LKJITI.2018.v09.i03.p08]
[20]
Munir Prince MS, Hasan A, Shah FM. An efficient ensemble method for cancer detection. 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). 2019; pp. 1-6.
[http://dx.doi.org/10.1109/ICASERT.2019.8934817]
[21]
Ponomaryov V. Computer-aided detection system based on PCA/SVM for diagnosis of breast cancer lesions 2015; pp. 429-36.
[http://dx.doi.org/10.1109/Chilecon.2015.7400413]
[22]
Baker QB, Abu Qutaish A. Evaluation of histopathological images segmentation techniques for breast cancer detection. 12th International Conference on Information and Communication Systems (ICICS). 2021; pp. 134-39.
[http://dx.doi.org/10.1109/ICICS52457.2021.9464594]
[23]
Vargas-Obieta E, Martínez-Espinosa JC, Martínez-Zerega BE, Jave-Suárez LF, Aguilar-Lemarroy A, González-Solís JL. Breast cancer de-tection based on serum sample surface enhanced Raman spectroscopy. Lasers Med Sci 2016; 31(7): 1317-24.
[http://dx.doi.org/10.1007/s10103-016-1976-x] [PMID: 27289243]
[24]
Aldhaeebi MA, Bamatraf S, Almoneef TS, Ramahi OM. Near-field electrically small sensors array with PCA for microwave breast tumor detection 2019; IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting 1007-8.
[http://dx.doi.org/10.1109/APUSNCURSINRSM.2019.8889280]
[25]
Preetha R, Jinny SV. Early diagnose breast cancer with PCA-LDA based FER and neuro-fuzzy classification system. J Ambient Intell Humaniz Comput 2021; 12(7): 7195-204.
[http://dx.doi.org/10.1007/s12652-020-02395-z]
[26]
Haque MSM, Hassan MR. BinMakhashen GM, Owaidh AH, Kamruzzaman J. Breast density classification for cancer detection using DCT-PCA feature extraction and classifier ensemble. Springer International Publishing 2018; Vol. 736