Learning in Combinatorial Optimization: What and How to Explore

Modaresi, Sajad; Sauré Valenzuela, Denis; Vielma, Juan Pablo

Author	dc.contributor.author	Modaresi, Sajad
Author	dc.contributor.author	Sauré Valenzuela, Denis
Author	dc.contributor.author	Vielma, Juan Pablo
Admission date	dc.date.accessioned	2021-03-22T21:07:34Z
Available date	dc.date.available	2021-03-22T21:07:34Z
Publication date	dc.date.issued	2020
Cita de ítem	dc.identifier.citation	Operations Research Volumen: 68 Número: 5 Páginas: 1585-1604 Sep-Oct 2020	es_ES
Identifier	dc.identifier.other	10.1287/opre.2019.1926
Identifier	dc.identifier.uri	https://repositorio.uchile.cl/handle/2250/178743
Abstract	dc.description.abstract	We study dynamic decision making under uncertainty when, at each period, a decision maker implements a solution to a combinatorial optimization problem. The objective coefficient vectors of said problem, which are unobserved before implementation, vary from period to period. These vectors, however, are known to be random draws from an initially unknown distribution with known range. By implementing different solutions, the decision maker extracts information about the underlying distribution but at the same time experiences the cost associated with said solutions. We show that resolving the implied exploration versus exploitation tradeoff efficiently is related to solving a lower-bound problem (LBP), which simultaneously answers the questions of what to explore and how to do so. We establish a fundamental limit on the asymptotic performance of any admissible policy that is proportional to the optimal objective value of the LBP problem. We show that such a lower bound might be asymptotically attained by policies that adaptively reconstruct and solve the LBP at an exponentially decreasing frequency. Because the LBP is likely intractable in practice, we propose policies that instead reconstruct and solve a proxy for the LBP, which we call the optimality cover problem (OCP). We provide strong evidence of the practical tractability of the OCP, which implies that the proposed policies can be implemented in real time. We test the performance of the proposed policies through extensive numerical experiments, and we show that they significantly outperform relevant benchmarks in the long-term and are competitive in the short-term.	es_ES
Patrocinador	dc.description.sponsorship	National Science Foundation (NSF) NSF - Directorate for Engineering (ENG) 1233441 Complex Engineering Systems Institute CONICYT: PIA FB0816	es_ES
Lenguage	dc.language.iso	en	es_ES
Publisher	dc.publisher	INFORMS	es_ES
Type of license	dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Chile	*
Link to License	dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
Source	dc.source	Operations Research	es_ES
Keywords	dc.subject	Combinatorial optimization	es_ES
Keywords	dc.subject	Multiarmed bandit	es_ES
Keywords	dc.subject	Mixed-integer programming	es_ES
Título	dc.title	Learning in Combinatorial Optimization: What and How to Explore	es_ES
Document type	dc.type	Artículo de revista	es_ES
dcterms.accessRights	dcterms.accessRights	Acceso Abierto
Cataloguer	uchile.catalogador	ctc	es_ES
Indexation	uchile.index	Artículo de publicación ISI
Indexation	uchile.index	Artículo de publicación SCOPUS

Files in this item

Name:: Learning-in-Combinatorial-Opti ...
Size:: 1.316Mb
Format:: PDF

This item appears in the following Collection(s)

Artículos de revistas
Artículos de revistas

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Chile