K Nearest Gaussian-A model fusion based framework for imbalanced classification with noisy dataset

Miao He, Jeffery D. Weir, Teresa Wu, Alvin Silva, Dianna-Yue Zhao, Wei Qian

Abstract


Data quality issues such as data imbalance and data noise have great impact on the performances of many classifiers. Althoughthe co-existence of imbalance and noise appears in many real world datasets, the issue of imbalance and noise have mostlybeen treated separately due to their different causes and problematic consequences. However, doing so may ignore the mutualeffects thus may not achieve optimal classification performance. In this research, we propose a model fusion based framework,termed K Nearest Gaussian (KNG) to tackle the imbalance and noise issues jointly. KNG employs generative modeling method(GMM) to extract the data characteristics from the training data which are less sensitive to data imbalance and noise. The datacharacteristics are then used to establish Gaussian confidence regions which are used to achieve final classification in a K nearestneighbor (KNN) manner. Experiments on seven UCI benchmark datasets and one medical imaging dataset show KNG methodgreatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.

Full Text:

PDF


DOI: https://doi.org/10.5430/air.v4n2p126

Refbacks

  • There are currently no refbacks.


Artificial Intelligence Research

ISSN 1927-6974 (Print)   ISSN 1927-6982 (Online)

Copyright © Sciedu Press 
To make sure that you can receive messages from us, please add the 'Sciedupress.com' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.