Show simple item record

Authordc.contributor.authorMedina Ortiz, David 
Authordc.contributor.authorContreras, Sebastián 
Authordc.contributor.authorQuiroz, Cristofer 
Authordc.contributor.authorOlivera Nappa, Álvaro María 
Admission datedc.date.accessioned2020-05-06T00:08:30Z
Available datedc.date.available2020-05-06T00:08:30Z
Publication datedc.date.issued2020
Cita de ítemdc.identifier.citationFrontiers in Molecular Biosciences February 2020 | Volume 7 | Article 13es_ES
Identifierdc.identifier.other10.3389/fmolb.2020.00013
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/174427
Abstractdc.description.abstractIn highly non-linear datasets, attributes or features do not allow readily finding visual patterns for identifying common underlying behaviors. Therefore, it is not possible to achieve classification or regression using linear or mildly non-linear hyperspace partition functions. Hence, supervised learning models based on the application of most existing algorithms are limited, and their performance metrics are low. Linear transformations of variables, such as principal components analysis, cannot avoid the problem, and even models based on artificial neural networks and deep learning are unable to improve the metrics. Sometimes, even when features allow classification or regression in reported cases, performance metrics of supervised learning algorithms remain unsatisfyingly low. This problem is recurrent in many areas of study as, per example, the clinical, biotechnological, and protein engineering areas, where many of the attributes are correlated in an unknown and very non-linear fashion or are categorical and difficult to relate to a target response variable. In such areas, being able to create predictive models would dramatically impact the quality of their outcomes, generating an immediate added value for both the scientific and general public. In this manuscript, we present RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models. The partitions obtained are statistically cross-validated, ensuring correct representativity and no over-fitting. We have successfully tested RV-Clustering in several highly non-linear datasets with different origins. The approach herein proposed has generated classification and regression models with high-performance metrics, which further supports its ability to generate predictive models for highly non-linear datasets. Advantageously, the method does not require significant human input, which guarantees a higher usability in the biological, biomedical, and protein engineering community with no specific knowledge in the machine learning area.es_ES
Patrocinadordc.description.sponsorshipCentre for Biotechnology and Bioengineering-CeBiB (PIA project, Conicyt, Chile) FB0001es_ES
Lenguagedc.language.isoenes_ES
Publisherdc.publisherFrontiers Mediaes_ES
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Sourcedc.sourceFrontiers in Molecular Bioscienceses_ES
Keywordsdc.subjectHighly non-linear datasetses_ES
Keywordsdc.subjectSupervised learning algorithmses_ES
Keywordsdc.subjectClusteringes_ES
Keywordsdc.subjectStatistical techniqueses_ES
Keywordsdc.subjectRecursive binary methodses_ES
Títulodc.titleDevelopment of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasetses_ES
Document typedc.typeArtículo de revistaes_ES
dcterms.accessRightsdcterms.accessRightsAcceso Abierto
Catalogueruchile.catalogadorcrbes_ES
Indexationuchile.indexArtículo de publicación ISI
Indexationuchile.indexArtículo de publicación SCOPUS


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile