rl | Blake Wulfe

Dynamics-Aware Comparison of Learned Reward Functions

We propose a method for quantifying the similarity of learned reward functions without performing policy learning and evaluation.

MuZero

Why should MuZero perform better than other DRL algorithms?

Multi-Agent Imitation Learning for Driving Simulation

One potential method for validating autonomous vehicles is to evaluate them in a simulator. For this to work, you need highly realistic models of human driving behavior. Existing research learned human driver models using generative adversarial imitation learning, but did so in a single-agent environment. As a result, the model fails when you execute many of the learned policies simultaneously. This research performs training in a multi-agent setting to address this problem.

Deep Reinforcement Learning for Collision Avoidance

We learn UAV collision avoidance policies directly from a simulator with a Deep Q-Network (DQN). This approach not only solves for policies more quickly than value iteration, but also arrives at safer and more efficient solutions by learning a direct mapping from the state space instead of discretizing it.

DQN Tips

Tips and heuristics for training Deep Q-Networks