This project is a part of Udacity’s Deep Reinforcement Learning Nanodegree Program
Our main directive is to create a Reinforcement Learning agent to solve Unity Banana environment.
A fully trained agent
The environment was provided by Udacity, built on top of Unity Machine Learning Agents Toolkit.
In this environment, the agent’s task is to collect as many yellow bananas as possible, while avoiding blue bananas. Collecting a yellow banana grants the agent +1 reward, and -1 reward if the agent collects a blue banana.
The state space has 37 dimensions and contains the agent’s velocity, along with ray-based perception of objects around agent’s forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:
0- move forward.
1- move backward.
2- turn left.
3- turn right.
The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.
Unity is not needed to run my project, I will be providing the environment files below:
These files are provided as part of the project description by Udacity
Please note that if you want to run my model, you will need to download the environment file corresponds to your Operating System, and unzip it in the same level as the cloned repository.
After downloading the environment file and unzipping it to the correct location. You can start by
creating and activating a new virtual environment (Python 3.6) to run my project.
$ virtualenv drl $ source drl/bin/activate
Then we can start installing the dependencies needed
$ cd python/ $ pip install .
Then to train the agent, you need to make a small modification to
main.py file. Open it up and under line 22
where I defined the environment, the
file_name parameter to the function call need to have the correct path
to the environment file. After that is done, you can watch the agent trains in real time with
$ python main.py <Double>
<Double> can take either
true (using Double DQN) or
false (regular DQN)
Alternatively, you can download my pretrained weights from here,
place it in the same directory with
evaluation.py and run
$ python evaluation.py
to watch a fully trained agent plays the game.
DQN and Double DQN was implemented. In DQN, I used two identical neural networks (described below) to update weights and improve the accumulated return. In Double DQN, I also used two identical neural networks, but I use one to update the other.
Experience Replay (EP) was also implemented, the idea of EP is to interact with the environment, store the experiences in a dictionary, and sample randomly from this dictionary to learn.
For neural network architecture, I used a 3 layers fully connected perceptron network to map from the state to action values. Each hidden layer consists of 128 fully connected units.
Accumulated return over 2000 episodes
My agent was able to solve the environment in 503 episodes for regular DQN, and 381 episodes for Double DQN network.
Now that this environment is solved, I should try out solving the same game, but with a different state representation. Representing the state as raw pixels would be more realistic since raw pixels are what we see, and we should allow the agent to see the same.
The full source code for the project can be found here