Show simple item record

Authordc.contributor.authorGarain, Avishek
Authordc.contributor.authorRay, Biswarup
Authordc.contributor.authorGiampaolo, Fabio
Authordc.contributor.authorVelásquez Silva, Juan Domingo
Authordc.contributor.authorSingh, Pawan Kumar
Authordc.contributor.authorSarkar, Ram
Admission datedc.date.accessioned2022-06-08T20:58:45Z
Available datedc.date.available2022-06-08T20:58:45Z
Publication datedc.date.issued2022
Cita de ítemdc.identifier.citationNeural Computing and Applications (2022)es_ES
Identifierdc.identifier.other10.1007/s00521-022-07261-x
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/185938
Abstractdc.description.abstractCompared to other features of the human body, voice is quite complex and dynamic, in a sense that a speech can be spoken in various languages with different accents and in different emotional states. Recognizing the gender, i.e. male or female from the voice of an individual, is by all accounts a minor errand for human beings. Similar goes for speaker identification if we are well accustomed with the speaker for a long time. Our ears function as the front end, accepting the sound signs which our cerebrum processes and settles on our disposition. Although being trivial for us, it becomes a challenging task to mimic for any computing device. Automatic gender, emotion and speaker identification systems have many applications in surveillance, multimedia technology, robotics and social media. In this paper, we propose a Golden Ratio-aided Neural Network (GRaNN) architecture for the said purposes. As deciding the number of units for each layer in deep NN is a challenging issue, we have done this using the concept of Golden Ratio. Prior to that, an optimal subset of features are selected from the feature vector extracted, common for all three tasks, from spectral images obtained from the input voice signals. We have used a wrapper-filter framework where minimum redundancy maximum relevance selected features are fed to Mayfly algorithm combined with adaptive beta hill climbing (A beta HC) algorithm. Our model achieves accuracies of 99.306% and 95.68% for gender identification in RAVDESS and Voice Gender datasets, 95.27% for emotion identification in RAVDESS dataset and 67.172% for speaker identification in RAVDESS dataset. Performance comparison of this model with existing models on the publicly available datasets confirms its superiority over those models. Results also ensure that we have chosen the common feature set meticulously, which works equally well on three different pattern classification tasks. The proposed wrapper-filter framework reduces the feature dimension significantly, thereby lessening the storage requirement and training time. Finally, strategically selecting the number units in each layer in NN help increases the overall performance of all three pattern classification tasks.es_ES
Patrocinadordc.description.sponsorshipANID PIA/APOYO AFB180003es_ES
Lenguagedc.language.isoenes_ES
Publisherdc.publisherSpringeres_ES
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
Sourcedc.sourceNeural Computing and Applicationses_ES
Keywordsdc.subjectMultilayer perceptrones_ES
Keywordsdc.subjectGolden ratioes_ES
Keywordsdc.subjectMayfly algorithmes_ES
Keywordsdc.subjectRAVDESSes_ES
Keywordsdc.subjectGender classificationes_ES
Keywordsdc.subjectEmotion recognitiones_ES
Keywordsdc.subjectSpeaker identificationes_ES
Títulodc.titleGRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signalses_ES
Document typedc.typeArtículo de revistaes_ES
dc.description.versiondc.description.versionVersión publicada - versión final del editores_ES
dcterms.accessRightsdcterms.accessRightsAcceso abiertoes_ES
Catalogueruchile.catalogadorapces_ES
Indexationuchile.indexArtículo de publícación WoSes_ES


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States