Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

Maldonado, Sebastián; Weber, Richard; Famili, Fazel

Author	dc.contributor.author	Maldonado, Sebastián
Author	dc.contributor.author	Weber, Richard	es_CL
Author	dc.contributor.author	Famili, Fazel	es_CL
Admission date	dc.date.accessioned	2014-12-22T19:46:31Z
Available date	dc.date.available	2014-12-22T19:46:31Z
Publication date	dc.date.issued	2014
Cita de ítem	dc.identifier.citation	Information Sciences Volume 286, 1 December 2014, Pages 228–246	en_US
Identifier	dc.identifier.other	doi:10.1016/j.ins.2014.07.015
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/126745
General note	dc.description	Artículo de publicación SCOPUS	en_US
Abstract	dc.description.abstract	Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or prediction, while the class-imbalance problem arises when the class distribution is too skewed. Both issues have been independently studied in the literature, and a plethora of methods to address high dimensionality as well as class-imbalance has been proposed. The aim of this work is to simultaneously explore both issues, proposing a family of methods that select those attributes that are relevant for the identification of the target class in binary classification. We propose a backward elimination approach based on successive holdout steps, whose contribution measure is based on a balanced loss function obtained on an independent subset. Our experiments are based on six highly imbalanced microarray data sets, comparing our methods with well-known feature selection techniques, and obtaining a better prediction with consistently fewer relevant features.	en_US
Patrocinador	dc.description.sponsorship	CONICYT, FONDECYT	en_US
Lenguage	dc.language.iso	en	en_US
Publisher	dc.publisher	Elsevier	en_US
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Keywords	dc.subject	Data mining	en_US
Título	dc.title	Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines	en_US
Document type	dc.type	Artículo de revista

Files in this item

Name:: Feature-selection-for-high-dim ...
Size:: 535.3Kb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile