Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics

Maler, Adrian

Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics

Files

AdrianMaler2023.pdf (1.89 MB)

Date

2023-04-14

Authors

Maler, Adrian

Abstract

In artificial intelligence, common sense refers to simple acts of verbal reasoning. The Winograd Schema Challenge (WSC), an important test of common sense, was recently defeated by transformer-based language models. We investigate the implications of that defeat: have language models achieved common sense, or is the challenge flawed? That is, we consider the problem of reevaluating verbal reasoning in language models. We evaluate the accuracy and consistency on Winograd schemas of three important pretrained models: GPT-2, RoBERTa, and T5. We generalize the Winograd schema to a larger class of problems, called adversarial schemas, and propose an evaluation protocol for them that incorporates consistency. We create a new test of common-sense verbal reasoning made up of our adversarial schemas. Each model performs significantly worse on our test than on WSC, and no model exhibits high consistency. We find no convincing evidence of verbal reasoning by language models.

Keywords

computer science, artificial intelligence, natural language processing, machine learning, language model, common-sense reasoning, Winograd Schema Challenge

URI

http://hdl.handle.net/10222/82418

Collections

Faculty of Graduate Studies Online Theses

Full item page

Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections