dc.contributor.author | Amaral, Ryan | |
dc.date.accessioned | 2021-08-27T12:30:07Z | |
dc.date.available | 2021-08-27T12:30:07Z | |
dc.date.issued | 2021-08-27T12:30:07Z | |
dc.identifier.uri | http://hdl.handle.net/10222/80746 | |
dc.description | Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches. | en_US |
dc.description.abstract | Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches. | en_US |
dc.language.iso | en | en_US |
dc.subject | Reinforcement Learning | en_US |
dc.subject | Genetic Programming | en_US |
dc.subject | Diversity | en_US |
dc.subject | Evolution | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Subpopulation | en_US |
dc.subject | Continuous Control | en_US |
dc.subject | OpenAI Gym | en_US |
dc.subject | ViZDoom | en_US |
dc.subject | SBB | en_US |
dc.subject | TPG | en_US |
dc.title | Reinforcement Learning with Real Valued Tangled Program Graphs | en_US |
dc.type | Thesis | en_US |
dc.date.defence | 2021-08-25 | |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Dr. Michael McAllister | en_US |
dc.contributor.thesis-reader | Dr. Andrew McIntyre | en_US |
dc.contributor.thesis-reader | Dr. Nur Zincir-Heywood | en_US |
dc.contributor.thesis-supervisor | Dr. Malcolm Heywood | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.copyright-release | Not Applicable | en_US |