Investigating the Effect of Human-in-the-Loop Teaching on Model Performance

Abstract:

Interactive Reinforcement Learning (IRL) is a machine learning technique which incorporates a human-in-the-loop training approach. The IRL agent learns from human provided feedback, which can be delivered through several different forms, for example, scalar value rewards. IRL has been proven to be a valuable training method for agents, demonstrating faster learning times and decreased exploration. However, much of the current research has been constrained to limited, discrete environments, where agents have few possible actions to choose from in a given state. It is also known that there are certain limitations or hindrances associated with IRL; for example, agents are susceptible to learning human biases. This project investigates the effects of IRL in a larger environment, formalised as a Partially Observable Markov Decision Process

(POMDP). A pre-trained RL agent was subject to an extra layer of training, in which participants of a user study were asked to observe and provide feedback to the agent following each action that was taken. Once training was complete, users interacted with both the RL and IRL agent whilst performing the same task. Results show that the IRL model outperforms the RL agent, confirming that a combination of both human and environmental rewards may be of value for complex, real-world environments. There is also evidence that the IRL training effected the state-action space of the model, improving overall task strategy. Future work intends to investigate the effects of human biases.

Download Megan's Thesis

My MSc Project

Megan_F_FINAL_01 from Housecat Productions on Vimeo.