Language Acquisition & Language Modelling
Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies
Suchir Salhan 🍋🍊
Richard Diehl Martinez 🍋
Zebulon Goriely 🍋
Paula Buttery 🍋🍊
🍋 Department of Computer Science & Technology, University of Cambridge, U.K.
🍊 ALTA Institute, University of Cambridge, U.K.
Arxiv Pre-Print, accepted (poster), BabyLM Shared Task, CoNLL 2024
Abstract
Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards, and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines. Performance benefits of curriculum strategies in SSLMs can be derived by specifying fine-grained, language-specific curricula that precisely replicate language acquisition theories.
Contact:
- Suchir Salhan: sas245@cam.ac.uk
Salhan, S.A., Martinez, R. D., Goriely, Z., & Buttery, P. (2024, November). Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. In Proceedings of the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning (pp. 112-127).
@inproceedings{salhan-etal-2024-less,
title = " Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies",
author = "Salhan, Suchir and
Diehl Martinez, Richard
Goriely, Zebulon and
Buttery, Paula",
editor = "Warstadt, Alex and
Mueller, Aaron and
Choshen, Leshem and
Wilcox, Ethan and
Zhuang, Chengxu and
Ciro, Juan and
Mosquera, Rafael and
Paranjabe, Bhargavi and
Williams, Adina and
Linzen, Tal and
Cotterell, Ryan",
booktitle = "Proceedings of the BabyLM Challenge at the 28th Conference on Computational Natural Language Learning",
month = nov,
year = "2024",
address = "Miami",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.conll-babylm.10",
doi = "10.18653/v1/2023.conll-babylm.10",
pages = "112--127",
}