Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics
dc.contributor.author | Maler, Adrian | |
dc.contributor.copyright-release | Not Applicable | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Michael McAllister | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.thesis-reader | Darren Abramson | en_US |
dc.contributor.thesis-reader | Dirk Arnold | en_US |
dc.contributor.thesis-supervisor | Vlado Keselj | en_US |
dc.date.accessioned | 2023-04-17T12:10:31Z | |
dc.date.available | 2023-04-17T12:10:31Z | |
dc.date.defence | 2023-04-05 | |
dc.date.issued | 2023-04-14 | |
dc.description.abstract | In artificial intelligence, common sense refers to simple acts of verbal reasoning. The Winograd Schema Challenge (WSC), an important test of common sense, was recently defeated by transformer-based language models. We investigate the implications of that defeat: have language models achieved common sense, or is the challenge flawed? That is, we consider the problem of reevaluating verbal reasoning in language models. We evaluate the accuracy and consistency on Winograd schemas of three important pretrained models: GPT-2, RoBERTa, and T5. We generalize the Winograd schema to a larger class of problems, called adversarial schemas, and propose an evaluation protocol for them that incorporates consistency. We create a new test of common-sense verbal reasoning made up of our adversarial schemas. Each model performs significantly worse on our test than on WSC, and no model exhibits high consistency. We find no convincing evidence of verbal reasoning by language models. | en_US |
dc.identifier.uri | http://hdl.handle.net/10222/82418 | |
dc.language.iso | en | en_US |
dc.subject | computer science | en_US |
dc.subject | artificial intelligence | en_US |
dc.subject | natural language processing | en_US |
dc.subject | machine learning | en_US |
dc.subject | language model | en_US |
dc.subject | common-sense reasoning | en_US |
dc.subject | Winograd Schema Challenge | en_US |
dc.title | Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics | en_US |