Blake Wulfe

Research Engineer

Biography

I research machine learning methods for decision making under uncertainty at the Toyota Research Institute.

I previously completed a master’s degree in computer science at Stanford, where I focused on reinforcement learning and decision making as part of the Stanford Intelligent Systems Laboratory advised by Mykel Kochenderfer.

During my master’s, I interned at Adobe Research where I was advised by Hung Bui. We performed research around imitation learning and its applications in artistic domains.

As an undergraduate student I studied computer science at Vanderbilt University, where I performed autonomous UAV research with my advisor Julie Adams as part of the Human-Machine Teaming Lab. I also worked with Eugene Vorobeychik on a project dealing with social interaction analysis, which is how I originally became interested in machine learning.

Interests

Decision Making Under Uncertainty
Reinforcement Learning
Imitation Learning

Education

MS in Computer Science, 2017

Stanford University
BSc in Computer Science, 2014

Vanderbilt University

Publications

Blake Wulfe, Ashwin Balakrishna, Logan Ellis, Jean Mercat, Rowan McAllister, Adrien Gaidon

January 2022 Spotlight Presentation (top 5% of submitted papers) ICLR 2022 reward learning, irl, rl, imitation learning

Dynamics-Aware Comparison of Learned Reward Functions

We propose a method for quantifying the similarity of learned reward functions without performing policy learning and evaluation.

PDF Project

Raunak P. Bhattacharyya, Derek J. Phillips, Blake Wulfe, Jeremy Morton, Alex Kuefler, Mykel J. Kochenderfer

March 2018 IROS 2018 rl, imitation learning

Multi-Agent Imitation Learning for Driving Simulation

One potential method for validating autonomous vehicles is to evaluate them in a simulator. For this to work, you need highly realistic models of human driving behavior. Existing research learned human driver models using generative adversarial imitation learning, but did so in a single-agent environment. As a result, the model fails when you execute many of the learned policies simultaneously. This research performs training in a multi-agent setting to address this problem.

PDF Code

Blake Wulfe, Sunil Chintakindi, Sou-Cheng T. Choi, Rory Hartong-Redden, Anuradha Kodali, Mykel J. Kochenderfer

February 2018 AAMAS 2018 policy evaluation, mdp, risk prediction, autonomous driving

Intermediate-Horizon Automotive Risk Prediction

This research considers the problem of predicting whether a car will suffer a collision in the time period 10-20 seconds in the future. We formulate this task as policy evaluation in a MDP with a high-dimensional, continuous state space, and a reward function dominated by rare events (collisions). We then demonstrate that simulated data and domain adaptation models can be used to improve prediction performance on real-world data.

PDF Code

Rachael E Tompa, Blake Wulfe, Michael P Owen, Mykel J Kochenderfer

September 2016 DASC 2016 planning

Collision Avoidance for Unmanned Aircraft Using Coordination Tables

How can UAVs with different collision avoidance strategies coordinate maneuvers so as to minimize collisions? This research presents an approach that enforces reasonable requirements on the behavior of UAVs, and as a result dramatically improves safety in dangerous encounters. The method is essentially to ensure that UAV maneuvers align with the directions of those advised by an optimal joint solution.

PDF Code

Posts

NeurIPS 2020 Procgen Competition

What I tried and learned

Dec 22, 2020 19 min read reinforcement learning

Super Smash Bros. 64 AI: Reinforcement Learning Part 1: The Environment

Introducing a Super Smash Bros. 64 RL environment and initial experiments

Jun 5, 2020 13 min read reinforcement learning, ssb64

MuZero

Why should MuZero perform better than other DRL algorithms?

Dec 9, 2019 2 min read rl, muzero

Super Smash Bros. 64 AI: Behavioral Cloning

Write-up on Super Smash Bros. 64 AI developed using behavioral cloning.

Jul 13, 2019 8 min read imitation learning, ssb64

See all posts

Projects

Deep Reinforcement Learning for Collision Avoidance

We learn UAV collision avoidance policies directly from a simulator with a Deep Q-Network (DQN). This approach not only solves for policies more quickly than value iteration, but also arrives at safer and more efficient solutions by learning a direct mapping from the state space instead of discretizing it.

PDF

Playing Atari with Hierarchical Deep Reinforcement Learning

How can artificial agents autonomously learn increasingly complex behavior? The traditional approach to solving this problem is to identify subgoals, and then to learn options (i.e., skills) useful for achieving those subgoals. In this research, we instead take a bottom-up approach to HRL, wherein the sequential, primitive actions of an agent are modeled as the result of latent variables, which may themselves be used as options (similar in concept to the approach taken here). This research makes an initial step in this direction, using hierarchical recurrent neural networks within a Recurrent Q-Network.

PDF Code

Comparison of Deep and Traditional Reinforcement Learning Methods for Playing Atari

This project compared the performance of traditional RL methods (Q-learning and SARSA with basis functions and eligibility traces) to DQN.

PDF Code

Language Modeling with Recurrent Generative Adversarial Networks

GANs have had a lot of success in generating images; can they be applied with similar effect to natural language? Since natural language is discrete, we formulate this task as a reinforcement learning problem, and use REINFORCE to train a recurrent generative network to maximize rewards produced by a discriminating network. We found this approach did not scale well to the vocab sizes used in realistic datasets (> 60,000 words), but believe improved training methods (e.g., TRPO) and curriculum learning (e.g., MIXER) might overcome these issues.

PDF Code

Geo-localization of Street View Images

Given a random, street-level image of a city from around the world, could you identify where the picture was taken? In this project, we collected a dataset of 100,000 images from ten cities, and trained a convolutional neural network to predict the city from the image. We found that the network successfully identifies the city with ~75% accuracy. Research tackling this project at a much larger scale was published concurrently (PlaNet)

PDF Code

Predicting Social Interaction Outcomes

In this project, I analyzed a set of ~2,000 pairwise social encounters. Using the audio, visual, and network features from those interactions, I was able to predict their outcomes with about 85% accuracy. While I am more focused on reinforcement learning now, I still think this topic is interesting, and in particular believe that enabling computers to intelligently interact with people (e.g., in healthcare or educational settings) would be widely beneficial.

PDF Code