Cambridge University UROPs 2025: Small Language Models Research Projects
Week | Topic | Course Materials |
---|---|---|
1 (14/07) | BabyLMs and Multilingual Evaluation |
Slides | Notes | Code Required Readings: Findings of the 1st BabyLM Challenge Findings of the 2nd BabyLM Challenge Handout on Language Model Evaluation Recommended Additional Readings: CLIMB – Curriculum Learning for Infant-inspired Model Building Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies Practical Tasks: TO DO |
2 (21/07) | Tokenisation and Interpretability Bilingual BabyLM Training and Evaluation |
Slides | Notes | Code Required Readings: The Linear Representation Hypothesis and the Geometry of Large Language Models Slides from Arthur Conmy (Google DeepMind) Recommended Additional Readings: Universal Dependencies On the Acquisition of Shared Grammatical Representations in Bilingual Language Models MultiBLiMP 1.0 Practical Tasks: The TransformerLens Library |
3 (28/07) | Pretraining Language Models: The Pico Framework |
Slides | Notes | Code Required Readings: Pico Train Tutorial Pico Analyze Tutorial Recommended Additional Readings: TO DO Practical Tasks: TO DO |
4 (04/08) | BabyLM Architectures & Feedback (ALTA CST) |
Slides | Notes | Code Required Readings: BabyLLama TO DO Practical Tasks: TO DO |
5 (11/08) | Mechanistic and Developmental Interpretability |
Slides | Notes | Code Required Readings: Additional Readings: Practical Tasks: Interim Project Presentation |
6 (18/08) | Train Your Own BabyLM From Scratch |
Slides | Notes | Code Interaction Track Multimodal Track Required Readings: TO DO Practical Tasks: TO DO |
7 (25/08) | Small Language Models – Frontier Problems |
Slides | Notes | Code Required Readings: The Linear Representation Hypothesis and the Geometry of Large Language Models Additional Readings: Slides from Arthur Conmy (Google DeepMind) Practical Tasks: TO DO |
8 (01/09) | Small Language Models – Frontier Problems (Architectures) |
Slides | Notes | Code Required Readings: The Linear Representation Hypothesis and the Geometry of Large Language Models Additional Readings: Slides from Arthur Conmy (Google DeepMind) Practical Tasks: Final Project Presentations and UROP Project Reports (Friday 5 September 2025, Due 5pm) |
Language Model Primers
- CS336 Language Models from Scratch
- CS224U Contextual Word Representations
- CS224U In-Context Learning
Small Language Models
- CS224U Diffusion Models for Text
- A Comprehensive Survey of Small Language Models in the Era of Large Language Models
- Small Language Models are the Future of Agentic AI
Find Out More!
Tokenisers
Non-Autoregressive Language Models
- CS224U Diffusion Models for Text
- What are Diffusion Language Models? – Blogpost by Xiaochen Zhu (NLIP Group, PhD Student)
Useful Links (Raven Access Required)
See Dr Andrew Caines's page for lots of useful advice, particularly about access to compute resources.
Dr Russell Moore previously developed his ML Commando Course for the 2021 ALTA UROPs, which you might find helpful.