# Information-Directed Exploration for Deep Reinforcement Learning

@article{Nikolov2019InformationDirectedEF, title={Information-Directed Exploration for Deep Reinforcement Learning}, author={Nikolay Nikolov and Johannes Kirschner and Felix Berkenkamp and Andreas Krause}, journal={ArXiv}, year={2019}, volume={abs/1812.07544} }

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling… Expand

#### 17 Citations

Successor Uncertainties: exploration and uncertainty in temporal difference learning

- Computer Science, Mathematics
- NeurIPS
- 2019

Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL, is designed and outperforms its closest RVF competitor, Bootstrapped DQN, on hard tabular exploration benchmarks. Expand

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

- Computer Science
- AAAI
- 2021

This paper proposes a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model, and formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction could effectively help to increase the expressiveness of the intrinsic reward signals. Expand

SEQUENCE-LEVEL INTRINSIC EXPLORATION MODEL

- 2019

Training reinforcement learning policies in partially observable domains with sparse reward signal is an important and open problem for the research community. In this paper, we introduce a new… Expand

Estimating Risk and Uncertainty in Deep Reinforcement Learning

- Computer Science, Mathematics
- ArXiv
- 2019

This work proposes a method for disentangling epistemic and aleatoric uncertainties in deep reinforcement learning that combines elements from distributional reinforcement learning and approximate Bayesian inference techniques with neural networks, allowing for both types of uncertainty on the expected return of a policy. Expand

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

- Mathematics, Computer Science
- ArXiv
- 2019

This work formalized a feasible metric for measuring the utility of exploration based on counterfactual ideology and proposed an end-to-end algorithm to learn exploration policy by meta-learning. Expand

Principled Exploration via Optimistic Bootstrapping and Backward Induction

- Computer Science
- ICML
- 2021

OB2I constructs a generalpurpose UCB-bonus through non-parametric bootstrap in DRL and propagates future uncertainty in a time-consistent manner through episodic backward update, which exploits the theoretical advantage and empirically improves the sample-efficiency. Expand

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

- Computer Science, Psychology
- NeurIPS
- 2020

A novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD), that maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectoryories. Expand

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

- Computer Science, Mathematics
- UAI
- 2020

A Bayesian approach for the credit assignment problem is developed, translating preferences to a posterior distribution over state-action reward models, and an asymptotic Bayesian no-regret rate is proved for DPS with a Bayesian linear regression credit assignment model. Expand

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

- Computer Science
- ArXiv
- 2021

This work investigates progress in sample efficiency on Atari games and continuous control tasks by comparing the number of samples that a variety of algorithms need to reach a given performance level according to training curves in the corresponding publications. Expand

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

- Computer Science, Mathematics
- ArXiv
- 2020

A new notion of eluder dimension for the policy space is proposed, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP), and a near-optimal sample complexity upper bound is proved that only depends linearly on theEluder dimension. Expand

#### References

SHOWING 1-10 OF 52 REFERENCES

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

- Computer Science, Mathematics
- ArXiv
- 2015

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods. Expand

VIME: Variational Information Maximizing Exploration

- Computer Science, Mathematics
- NIPS
- 2016

VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms. Expand

Efficient Exploration Through Bayesian Deep Q-Networks

- Computer Science, Mathematics
- 2018 Information Theory and Applications Workshop (ITA)
- 2018

Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm, is proposed, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. Expand

Parameter Space Noise for Exploration

- Computer Science, Mathematics
- ICLR
- 2018

This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Expand

Efficient exploration with Double Uncertain Value Networks

- Computer Science, Mathematics
- ArXiv
- 2017

Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge. Expand

Generalization and Exploration via Randomized Value Functions

- Mathematics, Computer Science
- ICML
- 2016

The results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization. Expand

Deep Reinforcement Learning with Risk-Seeking Exploration

- Computer Science
- SAB
- 2018

This paper proposes a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training and demonstrates the merit of the exploration heuristic by arguing that the risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment. Expand

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

- Computer Science, Mathematics
- ICLR
- 2018

This work benchmarks well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems and finds that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. Expand

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

- Computer Science, Mathematics
- NIPS
- 2017

A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks. Expand

Distributional Reinforcement Learning with Quantile Regression

- Computer Science, Mathematics
- AAAI
- 2018

A distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean is built, and a novel distributional reinforcement learning algorithm is presented consistent with the theoretical formulation. Expand