Date published:

Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages

Languages of the Indo-European family are spoken by almost half of the world’s population, but their origins and patterns of spread are disputed. Heggarty et al. present a database of 109 modern and 52 time-calibrated historical Indo-European languages, which they analyzed with models of Bayesian phylogenetic inference. Their results suggest an emergence of Indo-European languages around 8000 years before present. This is a deeper root date than previously thought, and it fits with an initial origin south of the Caucasus followed by a branch northward into the Steppe region. These findings lead to a “hybrid hypothesis” that reconciles current linguistic and ancient DNA evidence from both the eastern Fertile Crescent (as a primary source) and the steppe (as a secondary homeland).

Almost half the world’s population speaks a language of the Indo-European language family. It remains unclear, however, where this family’s common ancestral language (Proto-Indo-European) was initially spoken and when and why it spread through Eurasia. The “Steppe” hypothesis posits an expansion out of the Pontic-Caspian Steppe, no earlier than 6500 years before present (yr B.P.), and mostly with horse-based pastoralism from ~5000 yr B.P. An alternative “Anatolian” or “farming” hypothesis posits that Indo-European dispersed with agriculture out of parts of the Fertile Crescent, beginning as early as ~9500 to 8500 yr B.P. Ancient DNA (aDNA) is now bringing valuable new perspectives, but these remain only indirect interpretations of language prehistory. In this study, we tested between the time-depth predictions of the Anatolian and Steppe hypotheses, directly from language data. We report a new framework for the chronology and divergence sequence of Indo-European, using Bayesian phylogenetic methods applied to an extensive new dataset of core vocabulary across 161 Indo-European languages.

Paul Heggarty, Cormac Anderson, Matthew Scarborough, Benedict King, Remco Bouckaert, Lechosław Jocz, Martin Joachim Kümmel, Thomas Jügel, Britta Irslinger, Roland Pooth, Henrik Liljegren, Richard F. Strand, Geoffrey Haig, Martin Macák, Ronald I. Kim, Erik Anonby, Tijmen Pronk, Oleg Belyaev, Tonya Kim Dewey-Findell, Matthew Boutilier, Cassandra Freiberg, Robert Tegethoff, Matilde Serangeli, Nikos Liosis, Krzysztof Stroński, Kim Schulte, Ganesh Kumar Gupta, Wolfgang Haak, Johannes Krause, Quentin D. Atkinson, Simon J. Greenhill, Denise Kühnert, Russell D. Gray: Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, Science (381)(6656) (2023)