Repository logo
 

Evolving Policies to Solve the Rubik's Cube: Experiments with Ideal and Approximate Performance Functions

dc.contributor.authorSmith, Robert
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Malcolm Heywooden_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerDr. Andrew McIntyreen_US
dc.contributor.thesis-readerDr. Qigang Gaoen_US
dc.contributor.thesis-supervisorDr. Malcolm Heywooden_US
dc.date.accessioned2016-08-26T15:50:57Z
dc.date.available2016-08-26T15:50:57Z
dc.date.defence2016-08-23
dc.date.issued2016-08-26T15:50:57Z
dc.description.abstractThis work reports on an approach to direct policy discovery (a form of reinforcement learning) using genetic programming (GP) for the 3 x 3 x 3 Rubik's Cube. Specifically, a synthesis of two approaches is proposed: 1) a previous group theoretic formulation is used to suggest a sequence of objectives for developing solutions to different stages of the overall task; and 2) a hierarchical formulation of GP policy search is utilized in which policies adapted for an earlier objective are explicitly transferred to aid the construction of policies for the next objective. The resulting hierarchical organization of policies into a policy tree explicitly demonstrates task decomposition and policy reuse. Algorithmically, the process makes use of a recursive call to a common approach for maintaining a diverse population of GP individuals and then learns how to reuse subsets of programs (policies) developed against the earlier objective. Other than the two objectives, we do not explicitly identify how to decompose the task or mark specific policies for transfer. Moreover, at the end of evolution we return a population solving 100% of 17,675,698 different initial Cubes for the two objectives currently in use. A second set of experiments are then performed to qualify the relative contributions for two components for discovering policy trees: Policy diversity maintenance and Competitive coevolution. Both components prove to be fundamental. Without support for each, performance only reaches ~55% and ~23% respectively.en_US
dc.identifier.urihttp://hdl.handle.net/10222/72115
dc.language.isoen_USen_US
dc.subjectMachine learningen_US
dc.titleEvolving Policies to Solve the Rubik's Cube: Experiments with Ideal and Approximate Performance Functionsen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Smith-Robert-MCSC-CSCI-August-2016.pdf
Size:
586.59 KB
Format:
Adobe Portable Document Format
Description:
Final Thesis PDF/A Submission for Robert Smith (Second Submission)

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: