Skip to main content
AI, Society & Your Future

⏱ About 20 min20 XP

The Global AI Divide

The digital divide — the gap between those with access to information technology and those without — has been a central concern of technology policy since the 1990s. AI is creating a new and more complex version of that divide. The global AI divide is not simply about who has an internet connection; it encompasses who benefits from AI, who bears its costs, who shapes its design, and who is subject to its decisions without meaningful recourse. Understanding this multidimensional divide is essential to reasoning about AI as a force in the world.

Dimensions of the AI Divide

Researchers at the Oxford Internet Institute and the Partnership on AI have proposed thinking about the AI divide along several distinct dimensions, each with different causes and remedies. Capability divide: who can build frontier AI systems? This divide is extremely concentrated. The compute, talent, and capital required to train frontier models — the kind that power large language models, image generators, or protein-structure predictors — are accessible to perhaps a few dozen organizations worldwide, nearly all headquartered in the US, China, or the UK. The gap between frontier AI labs and everyone else is widening: the training cost of frontier models doubles roughly every nine months. Access divide: who can use existing AI tools? This divide is narrower and closing faster. A large language model accessible via a browser can reach anyone with a smartphone and a data plan. But 2.6 billion people worldwide lack internet access entirely, and another billion have access too slow or expensive for real-time AI applications. AI tools are overwhelmingly designed for English-language users: as of 2024, roughly 90% of AI training data is estimated to be in English or a small set of Western European languages, leaving speakers of the world's other 7,000-plus languages drastically underserved. Benefit divide: who captures the value that AI creates? As explored in Lesson 3, AI-generated productivity gains flow primarily to AI platform owners. Within countries, AI is widening inequality between high-skill and low-skill workers. Across countries, it is widening inequality between AI-producing and AI-consuming nations. Harm divide: who bears the costs and risks of AI? Communities in the Global South are disproportionately subject to AI-driven decisions — by humanitarian organizations using AI for refugee triage, by banks using AI-based credit scoring built on biased historical data, by content moderation systems that poorly handle non-English languages. These same communities have the least power to contest, appeal, or reshape those systems.

Compounding Disadvantage

The four dimensions of the AI divide reinforce each other. Communities that lack AI capability also tend to lack the political power to contest AI harms. Communities whose languages are underrepresented in training data receive lower-quality AI outputs and cannot effectively appeal decisions made by systems that misunderstand their context. The divide is not just a gap — it is a self-reinforcing structural inequality.

The Language Gap

Language is one of the most concrete and measurable dimensions of the AI divide. Large language models learn from text, and the internet's text is radically skewed: English dominates, followed by a small cluster of European and East Asian languages. The consequences are substantial. A speaker of Yoruba (100 million speakers in West Africa), Amharic (35 million speakers in Ethiopia), or Tagalog (90 million speakers in the Philippines) interacts with AI systems that were largely not trained on their language. When these systems are used — for translation, summarization, legal information, medical advice — the quality of output is significantly lower than for English users. The gap is not trivial: research has shown that GPT-4's performance on reasoning tasks in low-resource languages can be 20-40 percentage points below its English-language performance. This language gap has real consequences. A Somali refugee processed by an AI-assisted screening system may have their case assessed by a system that processes their language through rough machine translation, introducing errors at a critical juncture. A farmer in Bangladesh seeking AI agricultural advice may receive information calibrated to Western farming contexts rather than local crop varieties and climate patterns. Efforts to close the language gap include multilingual models (trained jointly across many languages), language-specific models (Meta's MMS, Google's USM, the Masakhane project's African-language NLP work), and community-led data collection to build training corpora for underrepresented languages.

Match each dimension of the global AI divide to the most accurate description of what it measures.

Terms

Capability divide
Access divide
Benefit divide
Harm divide

Definitions

The gap in who captures the economic value that AI-driven productivity creates
The gap in who bears the costs and risks of AI decisions, with the least power to contest them
The gap between who can build and train frontier AI systems versus everyone else
The gap between who can use existing AI tools, shaped by connectivity and language support

Drag terms onto their definitions, or click a term then click a definition to match.

Structural Causes and Proposed Remedies

The global AI divide has structural causes that simple market solutions will not automatically fix. AI development is subject to economies of scale and network effects: the more data and compute a model is trained on, the better it performs, which attracts more users, which generates more data and revenue, which funds more compute. This creates a virtuous cycle for incumbent leaders and a vicious cycle for latecomers. Proposed remedies operate at multiple levels. At the infrastructure level, international development organizations and some governments are investing in undersea cable networks, satellite internet access (including SpaceX Starlink and Amazon Kuiper), and local data center capacity to narrow the connectivity component of the access divide. At the data level, initiatives to digitize and make available text in underrepresented languages — including Wikipedia expansions, Common Crawl augmentation, and community-led transcription projects — address the language gap. At the capability level, proposals for an international AI development fund (analogous to CERN for physics) would pool resources for foundational AI research in the global public interest. At the governance level, mechanisms to give developing countries a meaningful voice in AI standards and governance — rather than simply receiving rules made elsewhere — are increasingly recognized as necessary. The UN's AI Advisory Body and the Global Partnership on AI have begun this work, though critics argue that without genuine resource transfers, representation without power changes little.

A large language model performs 35 percentage points worse on legal reasoning tasks in Swahili than in English. Which dimension of the global AI divide does this most directly illustrate?

Why does the AI divide tend to be self-reinforcing rather than naturally correcting over time?

Language Divide Investigation

  1. This activity requires access to a multilingual AI chatbot (such as ChatGPT or Gemini).
  2. Step 1: Choose a language spoken by at least 20 million people that is not English, French, Spanish, Mandarin, or German. Good choices: Swahili, Hausa, Yoruba, Amharic, Bengali, Tagalog, Burmese, or Khmer.
  3. Step 2: Ask the AI chatbot the same three questions in English and then in your chosen language (you can use a translation tool to prepare the non-English questions). Choose questions that require reasoning, not just factual recall.
  4. Step 3: Compare the quality of the responses. Are they equally detailed? Equally accurate? Does the AI switch to English mid-response? Does it confuse vocabulary or produce grammatical errors?
  5. Step 4: Reflect in writing (300 words): What did you find? What are the real-world consequences if someone relies on this AI for medical, legal, or educational guidance in the language you tested? What structural changes would be needed to close the gap you observed?