Evaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metrics
Date
2023-04-14
Authors
Maler, Adrian
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In artificial intelligence, common sense refers to simple acts of verbal reasoning. The Winograd Schema Challenge (WSC), an important test of common sense, was recently defeated by transformer-based language models. We investigate the implications of that defeat: have language models achieved common sense, or is the challenge flawed? That is, we consider the problem of reevaluating verbal reasoning in language models. We evaluate the accuracy and consistency on Winograd schemas of three important pretrained models: GPT-2, RoBERTa, and T5. We generalize the Winograd schema to a larger class of problems, called adversarial schemas, and propose an evaluation protocol for them that incorporates consistency. We create a new test of common-sense verbal reasoning made up of our adversarial schemas. Each model performs significantly worse on our test than on WSC, and no model exhibits high consistency. We find no convincing evidence of verbal reasoning by language models.
Description
Keywords
computer science, artificial intelligence, natural language processing, machine learning, language model, common-sense reasoning, Winograd Schema Challenge