BharatGen is a sovereign AI ecosystem built by IIT Bombay's Department of Computer Science and Engineering, led by Prof. Ganesh Ramakrishnan, in partnership with a consortium of 9 premier Indian academic institutions. It covers all 22 scheduled Indian languages across text, speech, and document understanding. It is backed by India's Department of Science and Technology and the IndiaAI Mission with ₹988.6 crore in funding.

Is BharatGen open source?

BharatGen releases a subset of models, weights, data, and training recipes as open source. Advanced resources are selectively shared with government and trusted partners. The stated goal is to democratize AI innovation for Indian developers and researchers.

Which languages does BharatGen support?

BharatGen covers all 22 scheduled Indian languages as defined in the Eighth Schedule of the Indian Constitution, including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, and others.

Where was BharatGen announced?

BharatGen was officially presented at Bharat Innovates 2026, held in Nice, France, June 14–16, 2026, under the hashtag #BharatInnovates2026. The announcement was made by IIT Bombay on June 15, 2026.

BharatGen: India's Sovereign AI for 22 Languages — IIT Bombay 2026 | explainx.ai Blog

Q: What models are included in BharatGen?

BharatGen includes four model families: Param2 (foundational text model with reasoning, coding, and tool calling across 22 Indian languages), Shrutam2 (automatic speech recognition across Indian languages), Sooktam2 (text-to-speech with zero-shot voice cloning across Indian languages), and Patram (document vision model for Indian-specific documentation).

India's Sovereign AI Goes Public

On June 15, 2026, at Bharat Innovates 2026 in Nice, France, IIT Bombay formally presented BharatGen to the world — a sovereign AI ecosystem built for India's 1.4 billion people across all 22 scheduled languages.

The announcement, which drew 118,900 views on X within hours, represents the culmination of a multi-year national effort: 9 premier academic institutions, 60+ researchers, engineers, and linguists, backed by India's Department of Science and Technology, the IndiaAI Mission, and ₹988.6 crore in funding.

BharatGen is not a single model. It is a four-family ecosystem covering the full stack of language interaction — text, speech, and documents — in every officially scheduled Indian language.

The Four Model Families

Param2 — Foundational Text Model

The cornerstone of BharatGen. Param2 is a foundational large language model that works across all 22 scheduled Indian languages with:

Reasoning capabilities (multi-step problem solving)
Coding support
Tool calling for agentic use cases

Param2 is built to handle not just translation but genuine native-language understanding — the cultural nuances, idioms, and domain knowledge that Western models trained primarily on English-language data routinely miss in Indian language contexts.

The use cases highlighted span governance, healthcare, education, insurance, finance, and cultural preservation — domains where the language gap between frontier AI and India's actual population has been most acute.

Shrutam2 — Multilingual Speech Recognition

Automatic speech recognition across Indian languages. India is predominantly an oral culture in many regions — literacy rates vary significantly, and for hundreds of millions of users, voice is the primary interaction modality.

Shrutam2 addresses the specific challenges of Indian speech recognition: phonetic complexity, tonal variations, code-switching (mixing multiple languages within a single utterance), and the acoustic diversity across India's geographic spread.

Sooktam2 — Text-to-Speech with Voice Cloning

Text-to-speech synthesis across Indian languages, with a notable capability: zero-shot voice cloning. The model can reproduce a target speaker's voice characteristics without fine-tuning on that speaker's data — enabling personalised speech synthesis for applications ranging from accessibility tools to personalised education.

Zero-shot voice cloning in multilingual Indian language contexts is technically demanding — Indian language prosody and phonology differ substantially from the Latin-script languages where most voice cloning research has been conducted. This is a meaningful technical achievement.

Patram — Document Vision Model

A vision-language model specifically designed for understanding Indian documents. This is a more specialised challenge than it might appear: Indian documentation includes documents in multiple scripts (Devanagari, Tamil script, Bengali script, Telugu script, and others), mixed-language content, handwritten text, and domain-specific formats used in Indian governance, legal, and financial systems that generic document AI models handle poorly.

Patram is positioned as infrastructure for digitising and understanding the enormous volume of India's existing document corpus — from government records to land registry documents to healthcare records.

The Dataset: India's Largest Open AI Corpus

Underlying all four model families is what BharatGen describes as the world's largest dataset of its kind focused on underrepresented Indian data:

Text, speech, and images tied to Indian languages, culture, history, and philosophy
15,000+ hours of annotated voice data across 22 Indian languages
Secure, versioned corpus with version control for reproducibility
Coverage of rural dialects and urban contexts

The dataset itself is a significant contribution independent of the models. India's AI development has been constrained by the absence of high-quality, culturally representative training data in Indian languages. BharatGen's dataset, released partially as open source, changes that constraint for the entire research community.

Why Sovereign AI Matters for India

The framing of BharatGen as "sovereign AI" is deliberate and politically significant. Three concerns motivate it:

1. Data sovereignty. When Indian citizens interact with AI models trained primarily on Western data and hosted on Western infrastructure, their data flows through systems India does not control. A sovereign AI ecosystem keeps that data — and the value derived from it — within Indian institutions.

2. Cultural representation. AI systems trained predominantly on English-language data encode cultural assumptions that may not apply to Indian contexts. Legal norms, medical practices, educational conventions, and social structures differ — and AI systems that don't understand those differences produce worse outcomes for Indian users even when translated into Indian languages.

3. Capability independence. India's experience with the US export ban on Fable 5 illustrates the vulnerability of depending on foreign-controlled frontier AI. A domestically developed and controlled AI ecosystem provides resilience against access restrictions.

These are not abstract concerns. They map directly onto BharatGen's target domains: governance, healthcare, and education are areas where the Indian state cannot afford dependency on AI infrastructure it does not control.

The Institutional Architecture

What distinguishes BharatGen from previous Indian AI initiatives is its institutional depth. The project is structured as:

Lead institution: IIT Bombay, Department of Computer Science and Engineering
Leadership: Prof. Ganesh Ramakrishnan (academic lead), Rishi Bal (CEO), Dr. Maneesh Singh (VP, Machine Learning)
Consortium: 9 premier Indian academic institutions
Team: 60+ researchers, engineers, linguists
Funding: DST + IndiaAI Mission, ₹988.6 crore secured

The involvement of 9 institutions rather than a single lab signals an attempt to build durable infrastructure rather than a one-time research project. The presence of a CEO suggests commercialisation is a design goal, not an afterthought.

What India's AI Ecosystem Gets

BharatGen's launch changes the landscape for Indian AI development in several ways:

For developers: Open-source model weights for text, speech, and TTS models, with training recipes — enabling Indian developers to build on BharatGen without rebuilding from scratch.

For enterprises: Production-ready models for governance, healthcare, and finance domains in all Indian languages, with IIT Bombay's research backing.

For researchers: The dataset corpus and benchmarks tailored to Indian language performance — enabling rigorous evaluation of Indian language AI that previous infrastructure did not support.

For policymakers: An Indian-controlled AI stack that can be deployed in sensitive domains without foreign data dependencies.

Where BharatGen Fits Globally

BharatGen is the most comprehensive Indian-language AI initiative to date, but it exists in a global context where language-specific sovereign AI is becoming a policy priority across multiple countries. France has Mistral (and its own language concerns, as illustrated by the Le Chaton Fat phenomenon), China has its domestic model ecosystem, and now India has BharatGen.

The pattern suggests we are entering a period of AI multipolarity — not a single global frontier model, but multiple national or regional AI ecosystems serving their own populations and regulatory contexts. BharatGen is India's entry into that multipolar world.

For Indian AI-native companies building on top of frontier models, BharatGen creates a new option: build on infrastructure that understands Indian languages natively, is controlled domestically, and does not expose user data to foreign jurisdictions.

India's Sovereign AI Goes Public

BharatGen is not a single model. It is a four-family ecosystem covering the full stack of language interaction — text, speech, and documents — in every officially scheduled Indian language.

The Four Model Families

Param2 — Foundational Text Model

The cornerstone of BharatGen. Param2 is a foundational large language model that works across all 22 scheduled Indian languages with:

Reasoning capabilities (multi-step problem solving)
Coding support
Tool calling for agentic use cases

Shrutam2 — Multilingual Speech Recognition

Sooktam2 — Text-to-Speech with Voice Cloning

Patram — Document Vision Model

The Dataset: India's Largest Open AI Corpus

Underlying all four model families is what BharatGen describes as the world's largest dataset of its kind focused on underrepresented Indian data:

Text, speech, and images tied to Indian languages, culture, history, and philosophy
15,000+ hours of annotated voice data across 22 Indian languages
Secure, versioned corpus with version control for reproducibility
Coverage of rural dialects and urban contexts

Why Sovereign AI Matters for India

The framing of BharatGen as "sovereign AI" is deliberate and politically significant. Three concerns motivate it:

The Institutional Architecture

What distinguishes BharatGen from previous Indian AI initiatives is its institutional depth. The project is structured as:

Lead institution: IIT Bombay, Department of Computer Science and Engineering
Leadership: Prof. Ganesh Ramakrishnan (academic lead), Rishi Bal (CEO), Dr. Maneesh Singh (VP, Machine Learning)
Consortium: 9 premier Indian academic institutions
Team: 60+ researchers, engineers, linguists
Funding: DST + IndiaAI Mission, ₹988.6 crore secured

What India's AI Ecosystem Gets

BharatGen's launch changes the landscape for Indian AI development in several ways:

For developers: Open-source model weights for text, speech, and TTS models, with training recipes — enabling Indian developers to build on BharatGen without rebuilding from scratch.

For enterprises: Production-ready models for governance, healthcare, and finance domains in all Indian languages, with IIT Bombay's research backing.

For researchers: The dataset corpus and benchmarks tailored to Indian language performance — enabling rigorous evaluation of Indian language AI that previous infrastructure did not support.

For policymakers: An Indian-controlled AI stack that can be deployed in sensitive domains without foreign data dependencies.

BharatGen: IIT Bombay Launches India's Sovereign AI for All 22 Scheduled Languages

India's Sovereign AI Goes Public

The Four Model Families

Param2 — Foundational Text Model

Shrutam2 — Multilingual Speech Recognition

Sooktam2 — Text-to-Speech with Voice Cloning

Patram — Document Vision Model

The Dataset: India's Largest Open AI Corpus

Why Sovereign AI Matters for India

The Institutional Architecture

What India's AI Ecosystem Gets

Where BharatGen Fits Globally

BharatGen: IIT Bombay Launches India's Sovereign AI for All 22 Scheduled Languages

India's Sovereign AI Goes Public

The Four Model Families

Param2 — Foundational Text Model

Shrutam2 — Multilingual Speech Recognition

Sooktam2 — Text-to-Speech with Voice Cloning

Patram — Document Vision Model

The Dataset: India's Largest Open AI Corpus

Why Sovereign AI Matters for India

The Institutional Architecture

What India's AI Ecosystem Gets

Where BharatGen Fits Globally

Related posts

Kimi K3 Open Weights Are Live — 2.8T Parameters, Day-0 on Together and Modal

Why explainx.ai Supports Open-Source AI

Open Weights Letter: OpenAI + Google Join — 70+ Signers

Related posts

Kimi K3 Open Weights Are Live — 2.8T Parameters, Day-0 on Together and Modal

Why explainx.ai Supports Open-Source AI

Open Weights Letter: OpenAI + Google Join — 70+ Signers

India's Sovereign AI Goes Public

The Four Model Families

Param2 — Foundational Text Model

Shrutam2 — Multilingual Speech Recognition

Sooktam2 — Text-to-Speech with Voice Cloning

Patram — Document Vision Model

The Dataset: India's Largest Open AI Corpus

Why Sovereign AI Matters for India

The Institutional Architecture

What India's AI Ecosystem Gets

Where BharatGen Fits Globally

Related Reading

India's Sovereign AI Goes Public

The Four Model Families

Param2 — Foundational Text Model

Shrutam2 — Multilingual Speech Recognition

Sooktam2 — Text-to-Speech with Voice Cloning

Patram — Document Vision Model

The Dataset: India's Largest Open AI Corpus

Why Sovereign AI Matters for India

The Institutional Architecture

What India's AI Ecosystem Gets

Where BharatGen Fits Globally

Related Reading

Related posts

Kimi K3 Open Weights Are Live — 2.8T Parameters, Day-0 on Together and Modal

Why explainx.ai Supports Open-Source AI

Open Weights Letter: OpenAI + Google Join — 70+ Signers

Related posts

Kimi K3 Open Weights Are Live — 2.8T Parameters, Day-0 on Together and Modal

Why explainx.ai Supports Open-Source AI

Open Weights Letter: OpenAI + Google Join — 70+ Signers