E2E-Player #2

Project Timeline

Build Gymnasium Environment for BeamNG.tech

An Environment that allows us to use Gymnasium and BeamNG.tech together, so we can gather observations and perform steps asynchronously.
Started experimenting with Behaivour Cloning / Imitation Learning Approaches

We implemented, trained and evaluated several different approaches for behaviour cloning, including Dagger, Gail, aswell as general CNN approaches.
Used the In-Game 'AI' to generate training Datasets

We developed a way to harness the available in-game AI to generate larger training datasets in less time, compared to human controlled / generated datasets.
Experiments with several different Approaches of Behaivour Cloning, Reinforcement Learning and End-to-End Data Aquisition

Continued experimentation with different approaches including DQN, Segmentation, Frame Stacking and more.
Developed Reinforcement Learning Approach

Our approach uses the Soft-Actor-Critic architecture to train an agent on several in-game attributes.
Testing an End-to-End Reinforcement Learning Approach

We investigated how adding the current image to the observations can improve the performance of the Soft-Actor-Critic Agent.
Convolutional Neural Network to generate Road Curvatures out of Images

The networks purpose is to generate a part of the needed inputs to the Soft-Actor-Critic Agent from an image, instead of using the in-game data.
Convolutional Neural Network to generate Rangefinders out of Images

Much like the Road Curvatures, the networks purpose is to generate a part of the needed inputs to the Soft-Actor-Critic Agent from an image, instead of using the in-game data.
Convolutional Neural Network to generate Speed / Acceleration vectors out of Images of the UI

Reading out parts of the UI to get current speed and acceleration for the Soft-Actor-Critic Agent.
[In Devolopment] Retrain the Soft-Actor-Critic Agent with the new Inputs generated from the different Networks

We're investigating whether retraining the Soft-Actor-Critic Agent with the new inputs, generated from the different networks, will equal or improve the performance of the Agent.
[In Devolopment] Building a final End-to-End Network with the Soft-Actor-Critic Agent

Finalizing our End-to-End Network, by combining all of the previously mentioned parts into a single network, that takes an image as input and outputs the needed controls for the Game.

Documentation

Soft-Actor-Critic

Soft Actor-Critic (SAC) is a reinforcement learning algorithm that aims to find a good balance between exploring new actions and using known, successful actions.

Inspiration

The inspiration for this approach was the paper "Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning" (Fuchs et al). We adapted the state space used for BeamNG to validate the learning procedure and then later adapted it for our task.

Reward

Lap time would be a useful reward, but in practice it can only be analysed at the end of a lap. So we use the distance between the states as a reward.

Penalty

To discourage crashes and shortcuts, damage to the car and time spent off the road are included as penalties in the reward.

Project Timeline

Build Gymnasium Environment for BeamNG.tech

Started experimenting with Behaivour Cloning / Imitation Learning Approaches

Used the In-Game 'AI' to generate training Datasets

Experiments with several different Approaches of Behaivour Cloning, Reinforcement Learning and End-to-End Data Aquisition

Developed Reinforcement Learning Approach

Testing an End-to-End Reinforcement Learning Approach

Convolutional Neural Network to generate Road Curvatures out of Images

Convolutional Neural Network to generate Rangefinders out of Images

Convolutional Neural Network to generate Speed / Acceleration vectors out of Images of the UI

[In Devolopment] Retrain the Soft-Actor-Critic Agent with the new Inputs generated from the different Networks

[In Devolopment] Building a final End-to-End Network with the Soft-Actor-Critic Agent

Documentation

Soft-Actor-Critic

Inspiration

Reward

Penalty