Suchir Salhan

profile photo

I am a PhD Candidate in Computer Science at the University of Cambridge (Gonville & Caius College), researching Machine Learning and Natural Language Processing and supervised by Professor Paula Buttery. I specialise in Machine Learning and Natural Language Processing. I am interested in developing cognitively-inspired computational systems, including alternatives to Transformer-based Large Language Models. I previously completed my Bachelor of Arts and Masters in Engineering in Computer Science and Linguistics at Gonville & Caius College, University of Cambridge, where I obtained a Starred First (Class I with Distinction) and a Distinction (also equivalent to a starred First) respectively. My research focuses on Small-Scale Language Models to improve the interpretability of Foundation Models.

My research primarily is concerned with engineering more cognitively plausible Foundation Models. This is an emerging research paradigm that attempts to improve the cognitive capabilities of state-of-the-art computational systems within a cognitively plausible environment. I have interests in Machine Learning Systems, the Theory of Deep Learning and Theoretical Linguistics. My ambition is to develop data-efficient Machine Learning systems that draw on human cognition.

To this end, I'm particularly interested in developing novel machine learning techniques to build scalable neural architectures that draw on formal methods (e.g., category and type theory) utilised in theoretical formalisms in Cognitive Science.

Email  /  CV  /  Twitter  /  LinkedIn  /  Github

Research

My research currently focuses on building small-scale Transformer-based language models. I have engineered curriculum learning (CL) strategies inspired by cutting-edge Language Acquisition frameworks.

I primarily work on Transformer-based Large Language Models (LLMs). I have worked with Multimodal Vision-Language Models in the Language Technology Lab with Prof Nigel Collier and Fangyu Liu (now Google DeepMind). Previously, I've probed vision-language models, exploring the semantic representations of CLIP. I have worked on Nearest Neighbour Algorithms for Offline Imitation Learning (IL). I have also worked on Explainable AI and Argumentation Mining, and Shortcut Learning in Natural Language Inference.

Published work

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies.
Suchir Salhan, Richard Diehl-Martinez, Zebulon Goriely, Paula Buttery
In Preparation for CoNLL BabyLM Challenge (Paper Track), 2024

Cognitively-Inspired Small-Scale Language Models (SSLMs) have been developed for four typologically distant language families: Romance (including French, Spanish, and Portuguese), Germanic (German and Dutch), Japanese, and Chinese, all trained on developmentally plausible corpora. Initial experiments with these SSLMs assessed the advantages of training a Transformer-based Language Model on a developmentally appropriate quantity of Child-Directed Speech (CDS). These experiments demonstrated that training SSLMs on CDS provides benefits beyond English, enabling the acquisition of grammatical knowledge comparable to that of pre-trained RoBERTa models, despite using approximately 25 times fewer parameters and 6,000 times fewer words. Furthermore, a Monolingual Age-Ordered framework, referred to as Curriculum Learning for Infant-Inspired Model Building (MAO-CLIMB), was introduced as a family of more "cognitively plausible" alternatives to BabyBERTa-style SSLMs. MAO-CLIMB incorporates three novel objective curricula, inspired by cutting-edge Chomskyan theories of language acquisition. The findings revealed that Transformer-based SSLMs do not adhere strictly to ordered developmental sequences, resulting in a mixed benefit of curriculum learning strategies. However, these strategies can sometimes outperform larger language models on certain syntactic benchmarks, particularly in Japanese. The study discusses the implications of these findings for constructing and evaluating cognitively plausible SSLMs beyond English.

LLMs “off-the-shelf” or Pretrain-from-Scratch? Recalibrating Biases and Improving Transparency using Small-Scale Language Models.
Suchir Salhan, Richard Diehl-Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery
Learning & Human Intelligence Group, Department of Computer Science & Technology , 2024

Our work has found Small-Scale Language Models (SSLMs) perform competitively against certain benchmarks for LLM Evaluation cross-linguistically. SSLMs have improved transparency and can outperform LLMs in specialist domains, demanding a more judicious assessment of the benefits of “pretraining-from-scratch” compared to using LLMs “off-the-shelf”.

On the Potential for Maximising Minimal Means in Transformer Language Models: A Dynamical Systems Perspective.
Suchir Salhan
In Cambridge Occasional Papers in Linguistics, Department of Theoretical & Applied Linguistics , 2023;

Computational linguists can utilise the insights of neo-emergent linguistic models, an approach to grammar construction that relies heavily on domain-general inductive biases, to address extant challenges associated with the syntactic and typological capabilities of state-of-the-art Transformer-based Language Models (LMs), which underpin systems like Google Translate and ChatGPT. I offer a synthesis of the inductive biases of Transformer-based LMs that are reminiscent of Dynamical Systems Theory (DST) approaches in human cognition. In doing so, I put forward a research agenda that will strengthen the case for minimalism in deep learning.

Computational Projects

Argumentation Mining
Suchir Salhan,
Department of Computer Science and Technology Natural Language Processing UROP , 2020

I was offered a UROP research project by my Director of Studies Prof Paula Buttery and Dr Andrew Caines. I worked under the supervision of computational linguists from the Automated Language Teaching and Assessment (ALTA) group in the Department of Computer Science and Technology. I was the only first-year student admitted to the two-month UROP programme. I worked on a project in collaboration with Thiemo Wambsganss on developing the back-end machine learning architecture for an application that supports the argumentation skills of English language learners. I trained and evaluated the performance of state-of-the-art transformer language models on downstream argumentation mining tasks, began working on the deployment of the pre-trained model in the application, and submitted the ethics application for the experimental evaluation of the educational outcomes of the application. I presented my work to members of the ALTA group and the project sponsors from Cambridge Assessment who funded my UROP project. I also had the opportunity to attend machine learning classes, and seminars on Dialogue Systems, Ethics in NLP, active learning paradigms and educational technology.

Theoretical Linguistics and Cognitive Science

I have worked as a Research Assistant for a small project on corpus-based studies of code-switching with Dr Li Nguyen.

My theoretical linguistics interests work against the background assumptions of neo-emergentist approaches, which assume a minimally endowed (genetic) component of the grammar and situate the burden of acquisition to the learner. Within this approach, I am concerned by questions of Learnability and look to formalise approaches that draw on Dynamical Systems Theory. I am keen to explore the potential relevance of information-theoretic approaches to acquisition and to explore syntax-phonology interface phenomena.


© Suchir Salhan 2024