Languages of Africa: A Data Analysis

2026
linguistics
geography
Author

Alexander Probst

Published

January 17, 2026

Executive Summary

This report analyses linguistic diversity across Africa using TidyTuesday data scraped from Wikipedia’s Languages of Africa page. The dataset contains 795 language–country observations representing 501 unique languages spoken across 51 African countries, grouped into 11 language families (after consolidating duplicates and grouping creoles).

Note on Data Structure: Each row represents a language spoken in a specific country. Languages spoken in multiple countries appear multiple times, which allows us to analyse cross-border linguistic connections.

Note on Speaker Counts: The native_speakers column captures first-language (L1) speakers only. This significantly underrepresents languages that serve as lingua francas — for example, Swahili has ~5.3 million native speakers in this dataset, but is estimated to have 100–150 million total speakers when including second-language (L2) use. Similarly, Arabic and other trade/regional languages will appear smaller than their true communicative reach. All speaker counts in this report should be interpreted with this caveat in mind.


Data Overview

Metric Value
Language–country pairs 795
Unique languages 501
Language families 11
Countries 51
Total native speakers 895,000,802
Cross-border languages 155 (31%)

Language Families: Niger–Congo Dominates in Diversity, Afroasiatic in Reach

The Niger–Congo family contains more than half of all African languages in this dataset, yet Afroasiatic — with far fewer languages — rivals it in total speaker count, driven largely by Arabic.

Side-by-side bar charts comparing number of languages and total speakers per language family.

Language count and total speakers by family (families with 4+ languages). Niger-Congo dominates diversity while Afroasiatic leads in speaker reach.

Top 10 Languages by Native (L1) Speakers

Horizontal bar chart of top 10 languages by native speakers, with Arabic leading at 150M.

Top 10 African languages by native speaker count, coloured by language family.
ImportantL1 vs. Total Speakers: A Critical Distinction

This chart ranks languages by native (L1) speakers only, which can be misleading. Swahili, for instance, ranks modestly here with ~5.3M L1 speakers, but is widely considered the most spoken language in Africa when including second-language (L2) speakers, with estimates of 100–200 million total speakers. Similarly, Hausa and Amharic have substantial L2 communities not captured here.

A ranking by total speakers (L1 + L2) would look very different — Swahili would likely top the list, and languages like Arabic and Hausa would also shift significantly.


Nearly a Quarter of Languages May Be Endangered

Using UNESCO’s threshold of 10,000 speakers to identify potentially endangered languages:

Bar chart showing four size categories: Endangered, Small, Medium, and Large, with Endangered being the second-largest group.

Distribution of languages by speaker population size. Nearly a quarter fall below the 10,000-speaker endangerment threshold.

Endangerment by Language Family

Horizontal bar chart showing endangerment rates by language family.

Percentage of languages below the 10,000-speaker threshold by family (families with 3+ languages).

Geographic Diversity: Cameroon Leads Africa

Map of Linguistic Diversity

Map of Africa coloured by number of distinct languages per country, with Cameroon and Congo darkest.

Choropleth map of linguistic diversity across Africa. Grey countries are absent from the dataset.

Top 15 Most Linguistically Diverse Countries

Horizontal bar chart of countries ranked by language count, Cameroon leading.

Top 15 countries by number of distinct languages.

Language Families by Region

Heatmap matrix with countries on the y-axis and language families on the x-axis, coloured by language count.

Heatmap of language counts by family and country for the top 8 families and 20 most diverse countries.

Niger–Congo languages dominate across West and Central Africa, while Afroasiatic languages concentrate in North and East Africa. Countries like Nigeria and Cameroon host languages from multiple distinct families.


Cross-Border Languages: Linguistic Bridges

Some languages are spoken across many national borders. Note that this table reflects official language status per country as recorded on Wikipedia — it understates the true reach of major lingua francas. Swahili, for example, appears in only 5 countries here but is widely spoken as a second language across much of East and Central Africa.

Language Countries Native Speakers (L1)
Arabic 12 150,000,000
Fulani 10 40,000,000
Mooré 8 12,000,000
Soninke 8 2,300,000
Gourmanché 6 1,500,000
Lozi 6 725,000
Bariba 5 1,100,000
Khwe 5 8,000
Mampruli 5 230,000
Portuguese 5 17,000,000
Swahili 5 5,300,000

Arabic is the most widespread language, spoken across 12 countries. Fulani spans 10 countries across West Africa, while Mooré and Soninke are each found in 8 nations. In total, 155 languages (31%) cross at least one national border.

Conclusion

Africa’s linguistic landscape reveals:

  • Extraordinary diversity — Over 501 languages across 11 families
  • Uneven distribution — Niger–Congo dominates in language count, but Afroasiatic rivals it in total speakers
  • Regional concentration — Cameroon, Congo, and Nigeria show the highest diversity
  • Cross-border connections — 155 languages unite people across national boundaries
  • Conservation concerns — Nearly a quarter of languages have vulnerable speaker populations

This rich linguistic heritage represents both a cultural treasure and a conservation challenge for the continent.