================================================== -->

Suchir Salhan

Suchir Salhan is a Computer Science PhD candidate at the University of Cambridge, working on Small Language Models under the supervision of Prof Paula Buttery and the Founder of Per Capita Media, Cambridge University's newest independent media organisation.





Cambridge Small Language Models: Cognitively-Inspired Alternatives to Transformer-based LLMs

My research is concerned with building more scalable small Language Models. While industry-led efforts have built competitive NLP systems that have fundamentally shifted the job of the (academic) NLP researcher in various ways, contemporary Natural Language Processing (NLP) still has several fundamental, open questions to address that are not obviously ancillary to commercial AI research labs.

Modern Natural Language Processing (NLP) is dominated by large pre-trained, highly parameterised neural networks trained on extremely large web-mined corpora. Training and inference using such models are incredibly costly, and the benefits of the pre-train/fine-tune paradigm are unclear for domain-specific downstream tasks. Recent advances in language modelling rely on pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivises the use of smaller counterparts. Additionally, theoretical linguists and cognitive scientists have highlighted several weaknesses with state-of-the-art foundation models.

Cambridge Small Language Models

This technical 'blog' on Small LMs discusses interesting techniques and perspectives from collaborators and other Machine Learning, NLP and Cognitive Science researchers. If this is something that may be of interest, please consider getting in touch! To contact me, email sas245@cam.ac.uk.

Architecture & Computational Complexity: The viability of 'Small LMs' as a coherent research programme relies on a successful consideration of efficiency, acceleration and architectural questions. There is a growing recognition that the computational complexity of self-attention in Transformers is suboptimal in various respects.

Cognitively-inspired AI: The emergent capabilities of Transformers are subject to a great deal of interpretability work, however there is a clear mismatch between human language acquisition (which is data-efficient in many regards) and the data-hungriness of Transformers. I am personally very invested in research questions that draw on insights from language acquisition to guide architectural alternatives to 'vanilla' Transformers.

Training Dynamics, Evaluation and Scalability: Benchmarking is a fundamental part of guiding contemporary AI systems, and requires an inherently interdisciplinary approach to be meaningful. Leading metrics are often variously incomplete or inadequate. However, equally, we should not only be interested reporting overall scores and metrics – training dynamics of models are equally important.

Domain-Specificity: Practioners interested in domain-specific Machine Learning (e.g., in educational, legal, biomedical domains) do not have sufficient control over the capabilities of Language Models. While novel post-training techniques are addressing this in various ways, techniques and strategies related to pretraining are equally important. The ethical and societal importance of domain-specific Language Models are huge. It is vitally important that research into the control over systems are not relegated solely to industry and Big Tech – who are guide by a different set of commercial priorities– particularly for research areas associated with large human costs (in labour and social terms).

Recent News:

I organise the Natural Language & Information Processing (NLIP) Seminars in the Department of Computer Science & Technology, University of Cambridge.

NLIP Seminars Michaelmas 2024

UROP Project Report 2021: Providing Automatic Feedback on Argumentation Quality to Learners of English

Multimodal Language Modelling across Languages and Cultures: Grounding Strategies for Concepts and Events

Handouts & Teaching Materials:

I delivered my first guest lecture in November 2024 for an MPhil course in the University of Cambridge with Prof Buttery and Dr Fermín Moscoso del Prado Martín on Language Model Evaluation – aged 22, this was a great opportunity and privilege so early in my “formal” academic career. In Lent 2025, I am the Teaching Assistant (a new role equivalent to Lead Demonstrator) for CST IA Machine Learning & Real World Data and am a supervisor for Machine Learning & Bayesian Inference (MBLI) [CST Part II]. I am supervising an MPhil Project on Language Model Stability.

Publications: