Breast Tissue Classification Method Based on Machine Learning

Article ID: e200123212974 Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Early detection and treatment of breast cancer are very necessary, and effective classification of breast tissue is helpful for the diagnosis of breast cancer; so, a classification method named FT_GA_GBDT is proposed. First, the correlations between the features and classification labels of breast tissue samples were determined, and features with higher correlation were analyzed statistically and combined by weight. Thus, feature transformation (FT) is realized. The datasets were then enhanced by calculating the mean and root mean square of the feature attributes of each adjacent odd- and even-row sample with both belonging to the same class. Finally, the genetic algorithm (GA) was used to search the optimal parameters of the gradient boosting decision tree (GBDT) model, and the optimal parameters were substituted into the GBDT to classify the breast tissue. In addition, the K-nearest-neighbor (KNN), support-vector-machine (SVM) and GBDT methods were also used to test the breast tissue classification. Results of 6-fold cross validation on three breast tissue datasets showed that the average Precision, Recall, and F1 score obtained by the FT_GA_GBDT method were better than those obtained by the KNN, SVM and GBDT methods. The results further show that the FT algorithm and searching for the optimal hyper-parameters by the GA were helpful in improving the performance of the breast tissue classification model, which is more obvious when the correlations between features and classification labels are generally not high.

[1]
M.K. Nidhi, and S. Makkar, "Classification of breast cancer tissues using decision tree algorithms", IJREAM, vol. 04, no. 5, pp. 342-346, 2018.
[2]
R.M. Shallu, "Automatic magnification independent classification of breast cancer tissue in histological images using deep convolutional neural network", ICAICR 2018, CCIS 955, vol. 955, pp. 772-781, 2019.
[3]
A. Helwana, J.B. Idokob, and R.H. Abiyev, "Machine learning techniques for classification of breast tissue", 9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception., pp. 24-25, 2017.
[http://dx.doi.org/10.1016/j.procs.2017.11.256]
[4]
M. Srivastava, "A comparative study on performance of breast tissues classification using support vector machine and regression", Int. J. Adv. Sci., vol. 29, no. 8, pp. 258-263, 2020.
[5]
H. Sug, "Better classification of pathological tissue classes from EIS data of breast tissue", Int. J. Biol. Biomed., vol. 12, pp. 192-201, 2018.
[6]
A. Ciritsis, C. Rossi, M. Eberhard, M. Marcon, A.S. Becker, and A. Boss, "Automatic classification of ultrasound breast lesions using a deep convolutional neural network mimicking human decision-making", Eur. Radiol., vol. 29, no. 10, pp. 5458-5468, 2019.
[http://dx.doi.org/10.1007/s00330-019-06118-7] [PMID: 30927100]
[7]
J. Virmani, "Classification of breast tissue density patterns using SVM-based hierarchical classifier", Adv. Intell. Syst., pp. 185-191, 2019.
[8]
R. Geetha, and M. Sivajothi, "Histopathological image classification scheme for breast tissues to detect mitosis", Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 11, pp. 2453-2459, 2019.
[http://dx.doi.org/10.35940/ijitee.K1553.0981119]
[9]
M. Caballo, J.M. Boone, R. Mann, and I. Sechopoulos, "An unsupervised automatic segmentation algorithm for breast tissue classification of dedicated breast computed tomography images", Med. Phys., vol. 45, no. 6, pp. 2542-2559, 2018.
[http://dx.doi.org/10.1002/mp.12920] [PMID: 29676025]
[10]
B. Malik, J. Klock, J. Wiskin, and M. Lenox, "Objective breast tissue image classification using quantitative transmission ultrasound tomography", Sci. Rep., vol. 6, no. 1, p. 38857, 2016.
[http://dx.doi.org/10.1038/srep38857] [PMID: 27934955]
[11]
M. Abien Fred, "On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic datasets", ICMLSC 2018, The 2nd International Conference on Machine Learning and Soft Computing, 2018.
[12]
D. Dua, and C. Graff, UCI Machine Learning Repository., University of California, School of Information and Computer Science: Irvine, CA, 2019.
[13]
P. Waldmann, "On the use of the pearson correlation coefficient for model evaluation in genome-wide prediction", Front. Genet., vol. 10, no. 10, p. 899, 2019.
[http://dx.doi.org/10.3389/fgene.2019.00899] [PMID: 31632436]
[14]
K. Zhu, S. Ying, N. Zhang, R. Wang, Y. Wu, G. Lan, and X. Wang, "A performance fault diagnosis method for SaaS software based on GBDT algorithm", Comput. Mater. Continua, vol. 62, no. 3, pp. 1161-1185, 2020.
[http://dx.doi.org/10.32604/cmc.2020.05247]
[15]
F. Fu, J. Jiang, Y. Shao, and B. Cui, "An experimental evaluation of large scale GBDT systems", Proceedings VLDB Endowment, vol. 12, no. 11, pp. 1357-1370, 2019.
[http://dx.doi.org/10.14778/3342263.3342273]
[16]
A. Zemliak, "A modified genetic algorithm for system optimization", Compel, vol. 41, no. 1, pp. 499-516, 2022.
[http://dx.doi.org/10.1108/COMPEL-08-2021-0296]
[17]
H. Juan, P. Hong, and W. Jun, "kNN-P: A kNN classifier optimized by P systems", Theor. Comput. Sci., vol. 817, no. 1, pp. 55-65, 2020.
[18]
M. Patrício, J. Pereira, J. Crisóstomo, P. Matafome, M. Gomes, R. Seiça, and F. Caramelo, "Using Resistin, glucose, age and BMI to predict the presence of breast cancer", BMC Cancer, vol. 18, no. 1, p. 29, 2018.
[http://dx.doi.org/10.1186/s12885-017-3877-1] [PMID: 29301500]