A new boundary-degree-based oversampling method for imbalanced data

dc.contributor.authorChen, Yueqi
dc.contributor.authorPedrycz, Witold
dc.contributor.authorYang, Jie
dc.date.accessioned2024-05-19T14:42:39Z
dc.date.available2024-05-19T14:42:39Z
dc.date.issued2023
dc.departmentİstinye Üniversitesien_US
dc.description.abstractImbalanced data constitute a significant challenge in practical applications, as standard classifiers are usually designed to work on data with balanced class label distributions. One of effective methods to solve the imbalanced problem is boundary oversampling method, which only focuses on the classification of boundary samples. However, most boundary oversampling methods roughly select boundary samples for oversampling without considering the potentially useful boundary characteristics inherent in majority (negative) class. To overcome this limitation, we propose a novel boundary-degree-based oversampling method (BDO) in this paper. The originality of BDO stemps from quantifying the degree to which each negative sample can be regarded as a boundary sample in terms of probability using information entropy. Applying the sigma rule on the quantified boundary degree, negative boundary samples are determined to indirectly select minority (positive) boundary samples for oversampling. In this way, a substantial amount of information hidden in the negative class can be mined. To further transfer the mined information to help oversample, BDO iteratively synthesizes aided boundary points along a fraudulent gradient. Oversampling finally is performed on both positive boundary samples and the aided boundary points. Experimental results completed on 15 benchmark imbalanced datasets, two multi-label datasets and one large-scale dataset in terms of G-mean, F-measure, AUC, accuracy, TPR and TNR show that BDO exhibits better performance, which is competitive with some commonly considered methods.en_US
dc.description.sponsorshipNational Key R&D Program of China [2018AAA0100300]; Fundamental Research Funds for the Central Universities [DUT22YG236]; National Natural Science Foundation of China [62172073, 62076182, 62176040]en_US
dc.description.sponsorshipThis work was supported by the National Key R&D Program of China under Grant 2018AAA0100300, the Fundamental Research Funds for the Central Universities under Grant DUT22YG236, and the National Natural Science Foundation of China under Grant 62172073, 62076182, 62176040.en_US
dc.identifier.doi10.1007/s10489-023-04846-4
dc.identifier.endpage26541en_US
dc.identifier.issn0924-669X
dc.identifier.issn1573-7497
dc.identifier.issue22en_US
dc.identifier.scopus2-s2.0-85168965932en_US
dc.identifier.scopusqualityQ2en_US
dc.identifier.startpage26518en_US
dc.identifier.urihttps://doi.org10.1007/s10489-023-04846-4
dc.identifier.urihttps://hdl.handle.net/20.500.12713/5267
dc.identifier.volume53en_US
dc.identifier.wosWOS:001063584500005en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.relation.ispartofApplied Intelligenceen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.snmz20240519_kaen_US
dc.subjectImbalanced Learningen_US
dc.subjectInformation Entropyen_US
dc.subjectGradienten_US
dc.subjectGaussian Probability Distribution Functionen_US
dc.subjectOversamplingen_US
dc.titleA new boundary-degree-based oversampling method for imbalanced dataen_US
dc.typeArticleen_US

Dosyalar