Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics

Maler, Adrian

Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics

dc.contributor.author	Maler, Adrian
dc.contributor.copyright-release	Not Applicable	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Michael McAllister	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.thesis-reader	Darren Abramson	en_US
dc.contributor.thesis-reader	Dirk Arnold	en_US
dc.contributor.thesis-supervisor	Vlado Keselj	en_US
dc.date.accessioned	2023-04-17T12:10:31Z
dc.date.available	2023-04-17T12:10:31Z
dc.date.defence	2023-04-05
dc.date.issued	2023-04-14
dc.description.abstract	In artificial intelligence, common sense refers to simple acts of verbal reasoning. The Winograd Schema Challenge (WSC), an important test of common sense, was recently defeated by transformer-based language models. We investigate the implications of that defeat: have language models achieved common sense, or is the challenge flawed? That is, we consider the problem of reevaluating verbal reasoning in language models. We evaluate the accuracy and consistency on Winograd schemas of three important pretrained models: GPT-2, RoBERTa, and T5. We generalize the Winograd schema to a larger class of problems, called adversarial schemas, and propose an evaluation protocol for them that incorporates consistency. We create a new test of common-sense verbal reasoning made up of our adversarial schemas. Each model performs significantly worse on our test than on WSC, and no model exhibits high consistency. We find no convincing evidence of verbal reasoning by language models.	en_US
dc.identifier.uri	http://hdl.handle.net/10222/82418
dc.language.iso	en	en_US
dc.subject	computer science	en_US
dc.subject	artificial intelligence	en_US
dc.subject	natural language processing	en_US
dc.subject	machine learning	en_US
dc.subject	language model	en_US
dc.subject	common-sense reasoning	en_US
dc.subject	Winograd Schema Challenge	en_US
dc.title	Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AdrianMaler2023.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Graduate Studies Online Theses