On the Potential for Maximising Minimal Means in Transformer Language Models

On the Potential for Maximising Minimal Means in Transformer Language Models

There is widespread interest in state-of-the-art Transformer-based Language Models (LMs), which underpin systems like Google Translate and ChatGPT. I argue that computational linguists can draw on the insights of neo-emergent linguistic models to address extant issues associated with the syntactic and typological capabilities of these models. In the first part of the talk, I offer a synthesis of the inductive biases of Transformer-based LMs that are reminiscent of the properties emphasised in Biberauer’s (2011 et seq) ‘maximise minimal means’ (MMM) model. Subsequently, I provide a detailed case study which indicates that Transformer-based LMs are unable to perform the crucial NO > ALL > SOME learning dynamics associated with this model. In light of these empirical findings, I offer a theoretical argument in the second part of the talk about how MMM and Dynamical Systems Theory (Bosch 2022, 2023) can be viewed as a linguistically-motivated goal for language models– in the sense of Emerson (2020). I outline how the predictions of this neo-emergentist approach can be operationalised to improve the syntactic capabilities of Transformer-based LMs. While these are only preliminary results, I hope that this can stimulate an interdisciplinary discussion on how linguistic theory can help improve the syntactic and typological capabilities of Transformer-based LMs.

Slides from a talk presented at Syntax Lab on 14th February 2023, a weekly departmental seminar on syntactic theory and organised by Dr Theresa Biberauer in the Section of Theoretical and Applied Linguistics (University of Cambridge), are available on ResearchGate:

PDF

Posted: