NLIP Seminars Michaelmas 2024

NLIP Seminars Michaelmas 2024

The Cambridge Natural Language & Information Processing (NLIP) Seminars are series of weekly talks from researchers in the fields of Natural Language Processing and Computational Linguistics, and adjacent fields including Machine Learning and Linguistics/Cognitive Science. I had the privilege of organising the seminars in the first term of my PhD, inheriting the seminar from Richard Diehl Martinez and Eric Chamoun. It was an interesting exercise, alongside getting my teeth into my own research. The aim of the seminar series, as is the case for any term card, is to provide researchers in our group the opportunity to engage with thought-provoking and potentially challenging ‘high-level’ research ideas. This short post summarises some (personal) key takeaways of the research talks from the speakers that I invited to speak in our Michaelmas Term speakers and finish off with some of my own personal thoughts.

Cognition, Linguistics and NLP

What about Linguistics? (Prof Janet Pierrehumbert, University of Oxford)

Truth-Conditional Semantics at Scale (Dr Guy Emerson, University of Cambridge)

Phonemic Pre-Training (Zebulon Goriely, University of Cambridge)

Lyric Generation (Yiwen Chen, University of Cambridge)

Language Models

Human Feedback (Max Bartolo, Cohere)

Max Bartolo from Cohere – making him our only speaker from industry this term– delivered a talk entitled “10 slides on Human Feedback”. Focussed reward models which are trained to predict a score reflecting on human preference (e.g., Reinforcement Learning from Human Feedback), Max provided a survey of different perspectives on aligning outputs of a language model to user preferences. The overall takeaway was that any notion of ‘feedback’ is inherently subjective. While this is not surprising, it means that we need to move to a position of having finer granularity to provide better rewards and additionally have more flexible models that can overcome biases. Max highlighted the PRISM Alignment Project from NEURIPS 2024, a dataset with sociodemographic and preference (Kirk et al 2024). However, like most efforts in NLP, the annotation efforts are computationally expensive. It is, therefore, not surprising that there are research efforts that are concerned with assessing the efficacy of synthetic data.

Tokenisers & Adaptive Memory (Dr Edoardo Ponti (Edinburgh/NVIDIA) and Benjamin Minixhofer (Language Technology Lab, Cambridge University)

Edoardo came to discuss his work on adaptive memorisation and tokenisation, an exciting proposal on how to address two fundamental issues associated with Transformers.

Memory Problems: Transformers experience a linear memory growth in the key-value cache, which is memory intensive and increases latency. As is well-discussed, fixed granularity of Transformer representations is a point of inefficiency. Edoardo proposed that we can ‘retrofit’ Transformers with adaptive memory – a technique that does not require training from scratch, which he defends as a preferential position to training from scratch. The key theoretical proposal is dynamic compression, a segmentation problem of separating key and value matrices in the Transformer, which accumulates or appends (rather than solely appending keys/values in standard self-attention). A causal

Augment causal mask in autoregressive with a DMC mask $\alpha$ – at convergence, this will be zero or one. Limit of log of zero, or instead of zero — the element will be either entirely accessible or inaccessible.

Posted: