Python, C++
2 months (Bachelor's thesis)
Group project (2/4)
Spring 2024
For my bachelor's thesis, we researched the transferability of humanlike behaviour in video games using reinforcement learning.
A DQN agent was built with Tensorflow in a custom environment using Gymnasium structure. Our aim was to identify parameters that relate to humanlike navagation in one game, and see if those were transferable to an agent playing another game using transfer learning.
The study, Generalisation of Humanlike Navigation in Game Playing Agents, can be found at the bottom of the page.
Summary of Experiment
The experiment was done using two observations by an expert panel. To prepare, we had to develop an AI agent that was able to play the games we had in mind.
Modify games to be playable by AI agent
Develop AI agent that could play both games (without modifying in between)
Train several models to play the first game using reinforcement learning
An expert panel observes the models and determine which is most humanlike
The most humanlike models train to play the second game using transfer learning
These models are observed together with a baseline model to determine if humanlike behaviour has been transferred
Analyse results
Development of the AI Agent
When deciding what to study for our bachelor's thesis, a group of four came together who all wanted to explore machine learning. Only two people were allowed per thesis, but we got the green light to develop an AI agent together that we could examine using two different research questions.
As development started, half of the group got sick. This was especially unfortunate because we had to prove that we were able to get results before fully committing to our theses. Neither of us had any practical experience using machine learning, meaning we went into this project quite blind (but optimistic). Knowing that we had this time limit, I put my all into developing the AI agent.
Python 01000001 01001001
Having next to no experience in Python, the first hurdle was to set up the environment in which we could develop the agent. The language itself proved very easy to get the hang of, but navigating the landscape of packages and versions was strenous. I learned a lot about rapid prototyping and documentation reading in a very short amount of time.
After many trial and errors, we managed to find a combination of Tensorflow and Gymnasium that worked. We created a custom environment within Gymnasium for it to be able to play our games. Knowing that we wanted the same agent to play both games, we couldn't rely on game-state- or metadata. Therefore, the agent's input had to be purely visual, making it quite general.
Machine learning
As we got the agent to actually play the game, the next step was to put a brain on it. Here came the very difficult assessment: how difficult was our game to learn? Would it take one hour to see progress? Would it take one month to see progress? We didn't know.
After exploring many different options, hyperparameters and settings, we landed in something that we thought should work. But how could we know for sure? I decided to reprogram the agent to be able to play Snake, a game so simple that we should get results within basically minutes. Progress was shown, and we hooked it back up to our original game and let it run over the weekend. The absolute joy of waking up the following Monday, looking at the graphs and seeing progress was euphoric. We finally knew that it worked.
Showcase
Here are two videos of different models playing the first game, using reinforcement learning. The objective is to gather all blue minerals before proceeding to the next level by stepping on the helipad.
Next are two videos of the AI agent having learned to play the second game using transfer learning. In other words, it has taken knowledge it learned from the previous game and is trying to apply it to the second game with different rules. The objective is now to shoot the crocodiles before proceeding to the next level by going to the door. The crocodiles patrol and will deal damage and hunt the player if they get too close. So in the previous game it learned to move towards the objectives, and now it has to keep a reasonable distance to both be able to avoid yet shoot the targets.
Findings
The study proved more interesting than I thought when we began analysing results. As we had such limited time in developing and training our AI agent, we had to prioritise performance (in terms of episodic rewards). Therefore, we did not really choose parameters that we thought were linked to human likeness, but instead the ones that were available from our trial and error models. However, when doing our thematic analysis, interesting findings emerged in both how to determine human likeness and transferability.
Our greatest contributions might be the methodology and research question in themself, to be further researched, and also the so called IPIM-model. When coding the observers' descriptions of the AI agents, four themes emerged: Intent, Personality, Intellect and Movement. These four themes could be used to exhaustingly describe the human likeness of our AI agent, and we suspect it might be equally effective in other games.
You can read our study in full here:
The other study (a techincal approach) by Isabel Diaz Johansson and Moa Wieweg can be found here:
Comments