Niño-Mora, José (2010) Computing an Index Policy for Bandits with Switching Penalties. In: 1st International ICST Workshop on Tools for solving Structured Markov Chains.
8305.pdf
Download (344kB)
Abstract
We address the multiarmed bandit problem with switching penalties including both costs and delays. Asawa and Teneketzis (1996) introduced an index for bandits with switching penalties that partially characterizes optimal policies, attaching to each project state a "continuation index" (its Gittins i
| Item Type: | Conference or Workshop Item (UNSPECIFIED) |
|---|---|
| Date Deposited: | 04 Mar 2026 08:45 |
| Last Modified: | 18 Apr 2026 06:30 |
| URI: | http://eprints.eai.eu/id/eprint/948 |
