Recent Advances in Computer Science and Communications

Author(s): Thi Ngoc A. Nguyen*, Quynh P. Nhu and Vijender K. Solanki

DOI: 10.2174/2666255813999201102093315

Novel Noise Filter Techniques and Dynamic Ensemble Selection for Classification

Page: [48 - 59] Pages: 12

  • * (Excluding Mailing and Handling)

Abstract

Background: Ensemble selection is one of the most researched topics for ensemble learning. Researchers have been attracted to selecting a subset of base classifiers that may perform more helpful than the whole ensemble system classifiers. Dynamic Ensemble Selection (DES) is one of the most effective techniques in classification problems. DES systems select the most appropriate classifiers from the candidate classifier pool. Ensemble models that balance diversity and accuracy in the training process improve performance than the whole classifiers.

Objective: In this paper, novel techniques are proposed by combining Noise Filter (NF) and Dynamic Ensemble System (DES) to have better predictive accuracy. In other words, a noise filter and DES make the data cleaner and DES improves the performance of classification.

Methods: The proposed NF-DES model, which was demonstrated on twelve datasets, especially has three credit scoring datasets and a performance measure accuracy.

Results: The results show that our proposed model is better than other models.

Conclusion: The novel noise filer and dynamic ensemble learning with the aim to improve the classification ability are presented. To improve the performance of classification, noise filter with dynamic ensemble learning makes the noise data toward the correct class. Then, novel dynamic ensemble learning chooses the appropriate subset classifiers in the pool of base classifiers.

Keywords: Dynamic ensemble, noise filter, ensemble method, ensemble selection, classification, support vector machine.

Graphical Abstract

[1]
T.T. Nguyen, A.V. Luong, M.T. Dang, A.W. Liew, and J. McCall, "Ensemble selection based on classifier prediction confidence", Pattern Recognit., vol. 100, p. 107104, 2020.
[http://dx.doi.org/10.1016/j.patcog.2019.107104]
[2]
L.I. Kuncheva, and E. Alpaydin, "Combining pattern classifiers: Methods and algorithms", IEEE Trans. Neural Netw., vol. 18, no. 3, pp. 964-964, 2007.
[3]
T. Woloszynski, M. Kurzynski, P. Podsiadlo, and G.W. Stachowiak, "A measure of competence based on random classification for dynamic ensemble selection", Inf. Fusion, vol. 13, no. 3, pp. 207-213, 2012.
[http://dx.doi.org/10.1016/j.inffus.2011.03.007]
[4]
Z.L. Zhang, Y.Y. Chen, J. Li, and X.G. Luo, "A distance- based weighting framework for boosting the performance of dynamic ensemble selection", Inf. Process. Manage., vol. 56, no. 4, pp. 1300-1316, 2019.
[http://dx.doi.org/10.1016/j.ipm.2019.03.009]
[5]
H. Zhang, H. He, and W. Zhang, "Classifier selection and clus-+tering with fuzzy assignment in ensemble model for credit scoring", Neurocomputing, vol. 316, pp. 210-221, 2018.
[http://dx.doi.org/10.1016/j.neucom.2018.07.070]
[6]
B. Krawczyk, M. Galar, M. Woz’niak, H. Bustince, and F. Herrera, "Dynamic ensemble selection for multiclass classification with one-class classifiers", Pattern Recognit., vol. 83, pp. 34-51, 2018.
[http://dx.doi.org/10.1016/j.patcog.2018.05.015]
[7]
X. Feng, Z. Xiao, B. Zhong, J. Qiu, and Y. Dong, "Dynamic ensemble classification for credit scoring using soft probability", Appl. Soft Comput., vol. 65, pp. 139-151, 2018.
[http://dx.doi.org/10.1016/j.asoc.2018h.01.021]
[8]
I. Partalas, G. Tsoumakas, and I. Vlahavas, "An ensemble uncertainty aware measure for directed hill climbing ensemble pruning", Mach. Learn., vol. 81, no. 3, pp. 257-282, 2010.
[http://dx.doi.org/10.1007/s10994-010-5172-0]
[9]
D.Y. Oh, and J.B. Gray, "Ga-ensemble: A genetic algorithm for robust ensembles", Comput. Stat., vol. 28, no. 5, pp. 2333-2347, 2013.
[http://dx.doi.org/10.1007/s00180-013-0409-6]
[10]
I. Visentini, L. Snidaro, and G.L. Foresti, "Diversity-aware classifier ensemble selection via f-score", Inf. Fusion, vol. 28, pp. 24-43, 2016.
[http://dx.doi.org/10.1016/j.inffus.2015.07.003]
[11]
A.H. Ko, R. Sabourin, and Jr A.S. Britto, "From dynamic classifier selection to dynamic ensemble selection", Pattern Recognit., vol. 41, no. 5, pp. 1718-1731, 2008.
[http://dx.doi.org/10.1016/j.patcog.2007.10.015]
[12]
F. Iglesias, and W. Kastner, "Analysis of similarity measures in times series clustering for the discovery of building energy patterns", Energies, vol. 6, no. 2, pp. 579-597, 2013.
[http://dx.doi.org/10.3390/en6020579]
[13]
S. Kiluk, "Algorithmic acquisition of diagnostic patterns in district heating billing system", Appl. Energy, vol. 91, no. 1, pp. 146-155, 2012.
[http://dx.doi.org/10.1016/j.apenergy.2011.09.023]
[14]
J.R. Quinlan, "Induction of decision trees", Mach. Learn., vol. 1, no. 1, pp. 81-106, 1986.
[http://dx.doi.org/10.1007/BF00116251]
[15]
G.L. Libralon, A.C. de Leon Ferreira, and A.C. Lorena, "Pre-processing for noise detection in gene expression classification data", J. Braz. Comput. Soc., vol. 15, no. 1, pp. 3-11, 2009.
[http://dx.doi.org/10.1007/BF03192573]
[16]
J.R. Quinlan, and S.L. Salzberg, "“C4. 5: Programs for machine learning”, Morgan Kaufmann Publishers”, Inc", Mach. Learn., vol. 16, pp. 235-240, 1993.
[http://dx.doi.org/10.1007/BF00993309]
[17]
A. Ghosh, N. Manwani, and P.S. Sastry, "Making risk mini-mization tolerant to label noise", Neurocomputing, vol. 160, pp. 93-107, 2015.
[http://dx.doi.org/10.1016/j.neucom.2014.09.081]
[18]
V. Vapnik, The nature of statistical learning theory., Springer Science & Business Media, 1999.
[19]
W. Yuan, D. Guan, T. Ma, and A.M. Khattakpo, "Classification with class noises through probabilistic sampling", Inf. Fusion, vol. 41, pp. 57-67, 2018.
[http://dx.doi.org/10.1016/j.inffus.2017.08.007]
[20]
L.P. Garcia, J. Lehmann, A.C. de Carvalho, and A.C. Lorena, "New label noise injection methods for the evaluation of noise filters", Knowl. Base. Syst., vol. 163, pp. 693-704, 2019.
[http://dx.doi.org/10.1016/j.knosys.2018.09.031]
[21]
J. Luengo, S.O. Shim, S. Alshomrani, A.H. Altalhi, and F. Herrera, "Cnc-nos: Class noise cleaning by ensemble filtering and noise scoring", Knowl. Base. Syst., vol. 140, pp. 27-49, 2018.
[http://dx.doi.org/10.1016/j.knosys.2017.10.026]
[22]
C.M. Teng, "Correcting noisy data", In Proceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, 1999pp. 239-248
[23]
C.E. Brodley, and M.A. Friedl, "Identifying mislabeled training data", J. Artif. Intell. Res., vol. 11, pp. 131-167, 1999.
[http://dx.doi.org/10.1613/jair.606]
[24]
M. Sabzevari, G. Martínez-Muñoz, and A. Suárez, "A two-stage ensemble method for the detection of class-label noise", Neurocomputing, vol. 275, pp. 2374-2383, 2018.
[http://dx.doi.org/10.1016/j.neucom.2017.11.012]
[25]
X. Zhu, X. Wu, and Q. Chen, "Eliminating class noise in large datasets", Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003pp. 920-927
[26]
X. Zhu, and X. Wu, "Class noise vs. attribute noise: A quantitative study", Artif. Intell. Rev., vol. 22, no. 3, pp. 177-210, 2004.
[http://dx.doi.org/10.1007/s10462-004-0751-8]
[27]
T. Woloszynski, and M. Kurzynski, "A probabilistic model of classifier competence for dynamic ensemble selection", Pattern Recognit., vol. 44, no. 10, pp. 2656-2668, 2011.
[http://dx.doi.org/10.1016/j.patcog.2011.03.020]
[28]
J. Li, Q. Zhu, and Q. Wu, "A self training method based on density peaks and an extended parameter free local noise filter for k nearest neighbor", Knowl. Base. Syst., vol. 184, p. 104895, 2019.
[http://dx.doi.org/10.1016/j.knosys.2019.104895]
[29]
T. Woloszynski, and M. Kurzynski, "A measure of competence based on randomized reference classifier for dynamic ensemble selection", In 2010 20th International Conference on Pattern Recognition, 2010pp. 4194-4197
[http://dx.doi.org/10.1109/ICPR.2010.1019]