Blake Wulfe

Blake Wulfe

Research Engineer


I'm currently working on decision making and prediction for autonomous driving at the Toyota Research Institute.

I previously completed a master’s degree in computer science at Stanford, where I focused on reinforcement learning and decision making as part of the Stanford Intelligent Systems Laboratory advised by Mykel Kochenderfer.

During my master’s, I interned at Adobe Research where I was advised by Hung Bui. We performed research around imitation learning and its applications in artistic domains.

As an undergraduate student I studied computer science at Vanderbilt University, where I performed autonomous UAV research with my advisor Julie Adams as part of the Human-Machine Teaming Lab. I also worked with Eugene Vorobeychik on a project dealing with social interaction analysis, which is how I originally became interested in machine learning.


  • Decision Making Under Uncertainty
  • Reinforcement Learning
  • Imitation Learning


  • MS in Computer Science, 2017

    Stanford University

  • BSc in Computer Science, 2014

    Vanderbilt University

Recent Posts

NeurIPS 2020 Procgen Competition

What I tried and learned

Super Smash Bros. 64 AI: Reinforcement Learning Part 1: The Environment

Introducing a Super Smash Bros. 64 RL environment and initial experiments


Why should MuZero perform better than other DRL algorithms?

Super Smash Bros. 64 AI: Behavioral Cloning

Write-up on Super Smash Bros. 64 AI developed using behavioral cloning.

Two Link Programmatic Control

Controlling a robotic arm with reinforcement learning 3: two-link programmatic control

Recent Publications

Multi-Agent Imitation Learning for Driving Simulation

One potential method for validating autonomous vehicles is to evaluate them in a simulator. For this to work, you need highly realistic models of human driving behavior. Existing research learned human driver models using generative adversarial imitation learning, but did so in a single-agent environment. As a result, the model fails when you execute many of the learned policies simultaneously. This research performs training in a multi-agent setting to address this problem.

Intermediate-Horizon Automotive Risk Prediction

This research considers the problem of predicting whether a car will suffer a collision in the time period 10-20 seconds in the future. We formulate this task as policy evaluation in a MDP with a high-dimensional, continuous state space, and a reward function dominated by rare events (collisions). We then demonstrate that simulated data and domain adaptation models can be used to improve prediction performance on real-world data.

Collision Avoidance for Unmanned Aircraft Using Coordination Tables

How can UAVs with different collision avoidance strategies coordinate maneuvers so as to minimize collisions? This research presents an approach that enforces reasonable requirements on the behavior of UAVs, and as a result dramatically improves safety in dangerous encounters. The method is essentially to ensure that UAV maneuvers align with the directions of those advised by an optimal joint solution.


Deep Reinforcement Learning for Collision Avoidance

We learn UAV collision avoidance policies directly from a simulator with a Deep Q-Network (DQN). This approach not only solves for policies more quickly than value iteration, but also arrives at safer and more efficient solutions by learning a direct mapping from the state space instead of discretizing it.

Playing Atari with Hierarchical Deep Reinforcement Learning

How can artificial agents autonomously learn increasingly complex behavior? The traditional approach to solving this problem is to identify subgoals, and then to learn options (i.e., skills) useful for achieving those subgoals. In this research, we instead take a bottom-up approach to HRL, wherein the sequential, primitive actions of an agent are modeled as the result of latent variables, which may themselves be used as options (similar in concept to the approach taken here). This research makes an initial step in this direction, using hierarchical recurrent neural networks within a Recurrent Q-Network.

Comparison of Deep and Traditional Reinforcement Learning Methods for Playing Atari

This project compared the performance of traditional RL methods (Q-learning and SARSA with basis functions and eligibility traces) to DQN.

Language Modeling with Recurrent Generative Adversarial Networks

GANs have had a lot of success in generating images; can they be applied with similar effect to natural language? Since natural language is discrete, we formulate this task as a reinforcement learning problem, and use REINFORCE to train a recurrent generative network to maximize rewards produced by a discriminating network. We found this approach did not scale well to the vocab sizes used in realistic datasets (> 60,000 words), but believe improved training methods (e.g., TRPO) and curriculum learning (e.g., MIXER) might overcome these issues.

Geo-localization of Street View Images

Given a random, street-level image of a city from around the world, could you identify where the picture was taken? In this project, we collected a dataset of 100,000 images from ten cities, and trained a convolutional neural network to predict the city from the image. We found that the network successfully identifies the city with ~75% accuracy. Research tackling this project at a much larger scale was published concurrently (PlaNet)

Predicting Social Interaction Outcomes

In this project, I analyzed a set of ~2,000 pairwise social encounters. Using the audio, visual, and network features from those interactions, I was able to predict their outcomes with about 85% accuracy. While I am more focused on reinforcement learning now, I still think this topic is interesting, and in particular believe that enabling computers to intelligently interact with people (e.g., in healthcare or educational settings) would be widely beneficial.