Show simple item record

Authordc.contributor.authorVilar, Daniel R. 
Authordc.contributor.authorPérez Flores, Claudio 
Admission datedc.date.accessioned2021-09-21T14:30:20Z
Available datedc.date.available2021-09-21T14:30:20Z
Publication datedc.date.issued2021
Cita de ítemdc.identifier.citationIEEE Access Volume 9 Page 65702-65720 (2021)es_ES
Identifierdc.identifier.other10.1109/ACCESS.2021.3076074
Identifierdc.identifier.urihttps://repositorio.uchile.cl/handle/2250/182014
Abstractdc.description.abstractWeakly supervised semantic segmentation (WSSS) methods have received significant attention in recent years, since they can dramatically reduce the annotation costs of fully supervised alternatives. While most previous studies focused on leveraging classification labels, we explore instead the use of image captions, which can be obtained easily from the web and contain richer visual information. Existing methods for this task assigned text snippets to relevant semantic labels by simply matching class names, and then employed a model trained to localize arbitrary text in images to generate pseudo-ground truth segmentation masks. Instead, we propose a dedicated caption processing module to extract structured supervision from captions, consisting of improved relevant object labels, their visual attributes, and additional background categories, all of which are useful for improving segmentation quality. This module uses syntactic structures learned from text data, and semantic relations retrieved from a knowledge database, without requiring additional annotations on the specific image domain, and consequently can be extended immediately to new object categories. We then present a novel localization network, which is trained to localize only these structured labels. This strategy simplifies model design, while focusing training signals on relevant visual information. Finally, we describe a method for leveraging all types of localization maps to obtain high-quality segmentation masks, which are used to train a supervised model. On the challenging MS-COCO dataset, our method moves the state-of-the-art forward significantly for WSSS with image-level supervision by a margin of 7.6% absolute (26.7% relative) mean Intersection-over-Union, achieving 54.5% precision and 50.9% recall.es_ES
Patrocinadordc.description.sponsorshipThis work was supported by ANID (Agencia Nacional de Investigación y Desarrollo) under Grants FONDECYT 1191610, and FONDEF ID16I20290, by the Department of Electrical Engineering, and Advanced Mining Technology Center (CONICYT Project AFB180004), Universidad de Chile.es_ES
Lenguagedc.language.isoenes_ES
Publisherdc.publisherIEEE-Inst Electrical Electronics Engineerses_ES
Type of licensedc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile*
Link to Licensedc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/*
Sourcedc.sourceIEEE Accesses_ES
Keywordsdc.subjectImage segmentationes_ES
Keywordsdc.subjectTraininges_ES
Keywordsdc.subjectSemanticses_ES
Keywordsdc.subjectVisualizationes_ES
Keywordsdc.subjectLocation awarenesses_ES
Keywordsdc.subjectCamses_ES
Keywordsdc.subjectTask analysises_ES
Keywordsdc.subjectImage captionses_ES
Keywordsdc.subjectSemantic segmentationes_ES
Keywordsdc.subjectWeakly supervisedes_ES
Títulodc.titleExtracting Structured Supervision From Captions for Weakly Supervised Semantic Segmentationes_ES
Document typedc.typeArtículo de revista
Catalogueruchile.catalogadorcrbes_ES


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Chile
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile