
Languages of Africa: A Data Analysis
Executive Summary
This report analyses linguistic diversity across Africa using TidyTuesday data scraped from Wikipedia’s Languages of Africa page. The dataset contains 795 language–country observations representing 501 unique languages spoken across 51 African countries, grouped into 11 language families (after consolidating duplicates and grouping creoles).
Note on Data Structure: Each row represents a language spoken in a specific country. Languages spoken in multiple countries appear multiple times, which allows us to analyse cross-border linguistic connections.
Note on Speaker Counts: The native_speakers column captures first-language (L1) speakers only. This significantly underrepresents languages that serve as lingua francas — for example, Swahili has ~5.3 million native speakers in this dataset, but is estimated to have 100–150 million total speakers when including second-language (L2) use. Similarly, Arabic and other trade/regional languages will appear smaller than their true communicative reach. All speaker counts in this report should be interpreted with this caveat in mind.
Data Overview
| Metric | Value |
|---|---|
| Language–country pairs | 795 |
| Unique languages | 501 |
| Language families | 11 |
| Countries | 51 |
| Total native speakers | 895,000,802 |
| Cross-border languages | 155 (31%) |
Language Families: Niger–Congo Dominates in Diversity, Afroasiatic in Reach
The Niger–Congo family contains more than half of all African languages in this dataset, yet Afroasiatic — with far fewer languages — rivals it in total speaker count, driven largely by Arabic.
Top 10 Languages by Native (L1) Speakers

This chart ranks languages by native (L1) speakers only, which can be misleading. Swahili, for instance, ranks modestly here with ~5.3M L1 speakers, but is widely considered the most spoken language in Africa when including second-language (L2) speakers, with estimates of 100–200 million total speakers. Similarly, Hausa and Amharic have substantial L2 communities not captured here.
A ranking by total speakers (L1 + L2) would look very different — Swahili would likely top the list, and languages like Arabic and Hausa would also shift significantly.
Nearly a Quarter of Languages May Be Endangered
Using UNESCO’s threshold of 10,000 speakers to identify potentially endangered languages:

Endangerment by Language Family

Geographic Diversity: Cameroon Leads Africa
Map of Linguistic Diversity

Top 15 Most Linguistically Diverse Countries

Language Families by Region

Niger–Congo languages dominate across West and Central Africa, while Afroasiatic languages concentrate in North and East Africa. Countries like Nigeria and Cameroon host languages from multiple distinct families.
Cross-Border Languages: Linguistic Bridges
Some languages are spoken across many national borders. Note that this table reflects official language status per country as recorded on Wikipedia — it understates the true reach of major lingua francas. Swahili, for example, appears in only 5 countries here but is widely spoken as a second language across much of East and Central Africa.
| Language | Countries | Native Speakers (L1) |
|---|---|---|
| Arabic | 12 | 150,000,000 |
| Fulani | 10 | 40,000,000 |
| Mooré | 8 | 12,000,000 |
| Soninke | 8 | 2,300,000 |
| Gourmanché | 6 | 1,500,000 |
| Lozi | 6 | 725,000 |
| Bariba | 5 | 1,100,000 |
| Khwe | 5 | 8,000 |
| Mampruli | 5 | 230,000 |
| Portuguese | 5 | 17,000,000 |
| Swahili | 5 | 5,300,000 |
Arabic is the most widespread language, spoken across 12 countries. Fulani spans 10 countries across West Africa, while Mooré and Soninke are each found in 8 nations. In total, 155 languages (31%) cross at least one national border.
Conclusion
Africa’s linguistic landscape reveals:
- Extraordinary diversity — Over 501 languages across 11 families
- Uneven distribution — Niger–Congo dominates in language count, but Afroasiatic rivals it in total speakers
- Regional concentration — Cameroon, Congo, and Nigeria show the highest diversity
- Cross-border connections — 155 languages unite people across national boundaries
- Conservation concerns — Nearly a quarter of languages have vulnerable speaker populations
This rich linguistic heritage represents both a cultural treasure and a conservation challenge for the continent.