Santa Fe
Institute
  • Research
    • Themes
    • Projects
    • SFI Press
    • Researchers
    • Publications
    • Library
    • Sponsored Research
    • Fellowships
    • Miller Scholarships
  • News + Events
    • News
    • Newsletters
    • Podcasts
    • SFI in the Media
    • Media Center
    • Events
    • Community
    • Journalism Fellowship
  • Education
    • Programs
    • Projects
    • Alumni
    • Complexity Explorer
    • Education FAQ
    • Postdoctoral Research
    • Education Supporters
  • People
    • Researchers
    • Fractal Faculty
    • Staff
    • Miller Scholars
    • Trustees
    • Governance
    • Resident Artists
    • Research Supporters
  • Applied Complexity
    • Office
    • Applied Projects
    • ACtioN
    • Applied Fellows
    • Studios
    • Applied Events
    • Login
  • Give
    • Give Now
    • Ways to Give
    • Contact
  • About
    • About SFI
    • Engage
    • Complex Systems
    • FAQ
    • Campuses
    • Jobs
    • Contact
    • Library
    • Employee Portal

Science for a Complex World

Events

Here's what's happening

Give

You make SFI possible

Subscribe

Sign up for research news

Connect

Follow us on social media

© 2026 Santa Fe Institute. All rights reserved. This site is supported by the Miller Omega Program.

Home / News

Researchers reconstruct major branches in the tree of language

(Photo: Kevin Wenning/Unsplash)
September 10, 2021

The diversity of human languages can be likened to branches on a tree. If you’re reading this in English, you’re on a branch that traces back to a common ancestor with Scots, which traces back to a more distant ancestor that split off into German and Dutch. Moving further in, there's the European branch that gave rise to Germanic; Celtic; Albanian; the Slavic languages; the Romance languages like Italian and Spanish; Armenian; Baltic; and Hellenic Greek. Before this branch, and some 5,000 years into human history, there’s Indo-European — a major proto-language that split into the European branch on one side, and on the other, the Indo-Iranian ancestor of modern Persian, Nepali, Bengali, Hindi, and many more.

One of the defining goals of historical linguistics is to map the ancestry of modern languages as far back as it will go — perhaps, some linguists hope, to a single common ancestor that would constitute the trunk of the metaphorical tree. But while many thrilling connections have been suggested based on systemic comparisons of data from most of the world's languages, much of the work, which goes back as early as the 1800s, has been prone to error. Linguists are still debating over the internal structure of such well-established families as Indo-European, and over the very existence of chronologically deeper and larger families.

To test which branches hold up under the weight of scrutiny, a team of researchers associated with the Evolution of Human Languages program is using a novel technique to comb through the data and to reconstruct major branches in the linguistic tree. In two recent papers, they examine the ~5,000-year-old Indo-European family, which has been well studied, and a more tenuous, older branch known as the Altaic macrofamily, which is thought to connect the linguistic ancestors of such distant languages as Turkish, Mongolian, Korean, and Japanese.

“The deeper you want to go back in time, the less you can rely on classic methods of language comparison to find meaningful correlates,” says co-author George Starostin, an SFI external professor based at the Higher School of Economics in Moscow. He explains that one of the major challenges when comparing across languages is distinguishing between words that have similar sounds and meanings because they might descend from a common ancestor, from those that are similar because their cultures borrowed terms from each other in the more recent past.

"We have to get to the deepest layer of language to identify its ancestry because the outer layers, they are contaminated. They get easily corrupted by replacements and borrowings,” he says.

To tap into the core layers of language, Starostin’s team starts with an established list of core, universal concepts from the human experience. It includes meanings like “rock,” “fire,” “cloud,” "two,” “hand,” and “human,” amongst 110 total concepts. Working from this list, the researchers then use classic methods of linguistic reconstruction to come up with a number of word shapes which they then match with specific meanings from the list. The approach, dubbed “onomasiological reconstruction,” notably differs from traditional approaches to comparative linguistics because it focuses on finding which words were used to express a given meaning in the proto-language, rather than on reconstructing phonetic shapes of those words and associating them with a vague cloud of meanings.

Their latest re-classification of the Indo-European family, which applies the onomasiological principle and was published in the journal Linguistics, confirmed well-documented genealogies in the literature. Similar research on the Eurasian Altaic language group, whose proto-language dates back an estimated 8,000 years, confirmed a positive signal of a relationship between most major branches of Altaic — Turkic, Mongolic, Tungusic, and Japanese. However, it failed to reproduce a previously published relationship between Korean and the other languages in the Altaic grouping. This could either mean that the new criteria were too strict or (less likely) that previous groupings were incorrect.

As the researchers test and reconstruct the branches of human language, one of the ultimate goals is to understand the evolutionary paths languages follow over generations, much like evolutionary biologists do for living organisms.

“One great thing about historical reconstruction of languages is that it's able to bring out a lot of cultural information,” Starostin says. “Reconstructing its internal phylogeny, like we’re doing in these studies, is the initial step to a much larger procedure of trying to reconstruct a large part of the lexical stock of that language, including its cultural lexicon.”

 

Read the paper, “Permutation test applied to lexical reconstructions partially supports the Altaic linguistic macrofamily,” in Evolutionary Human Sciences (June 1, 2021) 

Read the paper, "Rapid radiation of the inner Indo-European languages: an advanced approach to Indo-European lexicostatistics,” in Linguistics (June 18, 2021)





Share
  • Sign Up For SFI News
News Media Contact

Santa Fe Institute

Office of Communications
news@santafe.edu
505-984-8800



  • Tags
  • Research


  • Related Themes
  • Complexity and History


  • Related Projects
  • The origins, evolution, and diversity of human languages


More SFI News

View All News

Reinventing democracy before it breaks

Do deep learning models recognize 3D shapes in the same way humans do?

Upending assumptions about learning, inspired by an AI phenomenon

Looking at AGI through the lens of natural intelligence

A simple baseline for AI forecasting in machine learning

Constantino Tsallis to co-chair the 2027 Nobel Symposium on Statistical Mechanics

How novelty arrives: Review of “The Origins of the New”

Working group asks, what’s the benefit of a brain?

Measuring irreversibility in gene transcription

ACtioN Academy engages industry leaders on AI and complexity

Arguing for a complex adaptive power grid

Mark Newman Awarded 2026 SIAM John von Neumann Prize

Review: Nonesuch, by SFI Miller Scholar Francis Spufford

Laurent Hébert-Dufresne to receive Young Scientist Award

What does it mean to compute?

Reassessing the scientific method

SFI External Professor Santiago Elena elected to the American Academy of Microbiology

From cells to companies: Study shows how diversity scales within complex systems

SFI Press launches “The Economy as an Evolving Complex System IV”

New dataset reveals how U.S. law has grown more complex over the past century