How are Indian firms training LLMs?


How Are Indian Firms Training LLMs?

UPSC Prelims + Mains Study Note | GS-III: Science & Technology / Economy


1. At a Glance


2. Why in the News


3. Background & Evolution

Year Milestone
2023 Global LLM race accelerates post-ChatGPT; India lacks domestic GPU compute and curated Indian-language data
Mar 2024 Union Cabinet approves IndiaAI Mission — ₹10,372 crore outlay over multiple years under MeitY [S2]
Mid-2024 IndiaAI Compute Portal goes live; empanels AI service providers to offer shared GPU access at subsidised rates [S1]
Late 2024 Common compute capacity reaches 18,693 GPUs; 14 AI service providers empanelled; data centres in Mumbai, Navi Mumbai, Hyderabad, Bengaluru, Noida, Jamnagar [S1]
Apr 2025 Sarvam AI selected via competitive tender under IndiaAI Mission for sovereign LLM mandate [S3]
Feb 2026 Sarvam 30B and 105B released at AI Impact Summit; trained on IndiaAI Mission compute; use Mixture of Experts (MoE) architecture [S4]
2026 Common compute capacity crosses 38,000 GPUs + 1,050 TPUs; 20,000 additional GPUs under procurement [S1]

4. Core Static Facts

Implementing Body - Ministry of Electronics and Information Technology (MeitY) — nodal ministry for IndiaAI Mission [S2] - IndiaAI (dedicated implementation entity under MeitY) manages compute portal, model development calls, and startup support [S2]

Budget - ₹10,372 crore (≈ $1.1 billion) approved by Union Cabinet, March 2024 [S2]

Compute Infrastructure - 38,000+ GPUs empanelled across 14 AI service providers [S1] - 1,050 TPUs also available [S1] - Subsidised rate: ₹65–₹100/hour vs global market rate >₹200/hour [S1] - Data centre locations: Mumbai, Navi Mumbai, Hyderabad, Bengaluru, Noida, Jamnagar [S1] - Eligible users: startups, researchers, academic institutions, government organisations [S1]

Sarvam AI Models (Feb 2026) - Sarvam 30B — 32-billion-parameter Mixture of Experts (MoE), 65K context window; speed-optimised (comparable to Gemini Flash-Lite / GPT-5 mini tier) [S3][S4] - Sarvam 105B — 106-billion-parameter MoE, 128K context window; complex reasoning tasks (comparable to Gemini Pro / GPT-5 tier); wins ~90% of Indian-language benchmark comparisons vs GPT-4 [S3][S4] - Both models: open-sourced, trained 100% on IndiaAI Mission infrastructure [S3]

Key Terminology - LLM: Large Language Model — neural network trained on billions of text tokens, foundation for generative AI - Parameter: numerical weight in a neural network; more parameters ≈ greater model capacity - Token: smallest unit of text processed by an LLM; Indian-language text requires more tokens than equivalent English, raising inference cost [S4] - MoE (Mixture of Experts): architecture where only a subset of model parameters ("experts") are activated per inference pass — lowers compute cost without proportional loss in capability [S4] - GPU (Graphics Processing Unit): primary hardware for LLM training/inference - TPU (Tensor Processing Unit): Google-designed chip for ML workloads; available on IndiaAI portal


5. Multi-Dimensional Analysis

Economic

Scientific / Technological

Geopolitical / Strategic

Administrative / Governance

Ethical / Governance

Social


6. Recent Developments (Last 12–18 Months)


7. Prelims Hooks

  1. The IndiaAI Mission was approved by Union Cabinet in March 2024 with an outlay of ₹10,372 crore. [S2]
  2. Nodal ministry for IndiaAI Mission: Ministry of Electronics and Information Technology (MeitY). [S2]
  3. IndiaAI common compute portal provides GPUs at ₹65–₹100/hour vs global rates exceeding ₹200/hour. [S1]
  4. As of 2026, IndiaAI has empanelled 14 AI service providers offering 38,000+ GPUs and 1,050 TPUs. [S1]
  5. Sarvam AI is the Bengaluru-based startup selected by Government of India as the sovereign LLM developer under IndiaAI Mission. [S3]
  6. Sarvam's Sarvam 30B has 32 billion parameters with a 65K context window; Sarvam 105B has 106 billion parameters with a 128K context window. [S3]
  7. Both Sarvam models use Mixture of Experts (MoE) architecture — key reason they are less compute-intensive than comparable dense models. [S4]
  8. In MoE architecture, only a subset of parameters (experts) is activated per inference, reducing computational cost. [S4]
  9. Sarvam 105B wins approximately 90% of comparisons against GPT-4 on Indian-language benchmarks. [S3]
  10. Both Sarvam LLMs were trained entirely on IndiaAI Mission infrastructure and are open-sourced. [S3]
  11. IndiaAI data centres are located in: Mumbai, Navi Mumbai, Hyderabad, Bengaluru, Noida, and Jamnagar. [S1]
  12. Indian-language text costs more in LLM inference because it requires more tokens — including translation overhead to English. [S4]
  13. 500 AI Data Labs announced in September 2025 with ₹988 crore investment to expand India's AI data infrastructure. [S5]
  14. The AI Impact Summit where Sarvam models were unveiled was held at Bharat Mandapam, New Delhi, on 19 February 2026. [S4]

8. Mains Relevance

GS Papers: - GS-III: Science & Technology (AI/ML); Indian Economy (Start-up ecosystem, Digital Infrastructure) - GS-II: Government Policies and Interventions; e-Governance

Syllabus Headings: - GS-III: "Awareness in IT, Space, Computers, Robotics, Nano-technology, Bio-technology and issues relating to Intellectual Property Rights" - GS-III: "Indian Economy and issues relating to growth, development and employment" (Digital Economy) - GS-II: "Government policies and interventions for development in various sectors and issues arising out of their design and implementation"

Plausible Mains Questions: 1. "Critically examine the challenges faced by Indian firms in training Large Language Models domestically, and evaluate the role of the IndiaAI Mission in addressing them." (GS-III, 15 marks) 2. "The development of a sovereign LLM is not merely a technological achievement but a matter of strategic autonomy. Discuss in the context of India's IndiaAI Mission." (GS-III/GS-II, 15 marks) 3. "What is the Mixture of Experts (MoE) architecture, and why is it particularly significant for AI development in resource-constrained economies like India?" (GS-III, 10 marks)


9. Related Topics to Study Next

Topic Connection
IndiaAI Mission (full scope) Parent policy — covers Data Management, Application Development, FutureSkills, not just compute
National Data Governance Framework Governs the Indian-language data that feeds LLM training
Digital India Programme Foundational infrastructure (broadband, DigiLocker, UPI) on which AI deployment scales
Semiconductor Mission (India) Long-term solution to GPU import dependency; chip fabrication is the upstream bottleneck
Global AI Governance (G20, UN AI Panel) India's position on AI regulation, safety standards, and sovereign AI norms
Open-Source vs Proprietary AI Policy debate on whether government-funded models should be open-sourced — directly relevant to Sarvam case
National Language Technology Mission Earlier predecessor initiative for Indian-language NLP; provides historical context
Start-up India & Deep-Tech Policy Sarvam AI's growth is embedded in this broader start-up ecosystem support framework

10. Common Errors / Trap Areas

  1. Wrong ministry: IndiaAI Mission is under MeitY, NOT NITI Aayog (NITI Aayog authored earlier AI strategy papers but does not implement the Mission).
  2. Parameter confusion: Sarvam 30B is described as "35 billion parameters" in some reports and "32 billion" in others — the MoE model has 32B active parameters but 35B total; exam questions will likely use the model name (30B/105B) rather than exact count.
  3. MoE ≠ smaller model: Sarvam 105B has 106B total parameters — it is NOT a small model; MoE makes it cheaper to run because only a fraction of parameters activate per token, not because it has fewer parameters overall.
  4. Confusing Sarvam AI with other Indian AI initiatives: Krutrim (Ola), Hanooman (SML India), BharatGPT are separate LLM projects — Sarvam is the one with the government sovereign LLM mandate.
  5. Year of IndiaAI Mission approval: March 2024, not 2023 (the National AI Strategy/NITI Aayog paper was 2018; these are different documents separated by 6 years).

11. Sources

  • NRAA-Funded Wild Rice Conservation Project Secures Major Milestone in Assam
    NRAA-Funded Wild Rice Conservation Project Secures Major Milestone in Assam

    The notification of Borjuli site in Sonitpur, Assam as a Biodiversity Heritage Site under an NRAA-funded wild rice conservation project is a named, verifiable fact. Biodiversity Heritage Sites and wild crop genetic resource conservation are tested Prelims topics.

  • India Advances Global Green Hydrogen Leadership under National Green Hydrogen Mission

    Under the National Green Hydrogen Mission (NGHM), a landmark commercial deal for green ammonia and methanol export to Japan (IHI Corporation named) is a concrete outcome. India's green hydrogen ambitions and NGHM are recurring Prelims themes; this adds a factual export-deal hook.

  • NITI Aayog launches report on "Strategic Roadmap for Making Ayurveda Global"
    NITI Aayog launches report on "Strategic Roadmap for Making Ayurveda Global"

    A named NITI Aayog report on Ayurveda's global expansion is testable as a policy document. NITI Aayog reports, AYUSH sector initiatives, and traditional medicine diplomacy are recurring Prelims themes; the report's launch date and authoring body are clean factual hooks.

  • INDIAN NAVAL SHIP TRIKAND RESPONDS TO PIRACY ATTEMPT ON MV GOLDEN ARSENAL IN THE GULF OF ADEN

    A named Indian Navy anti-piracy operation with specific ship (INS Trikand — identified as a stealth frigate), vessel flag state (St. Vincent and the Grenadines), and location (Gulf of Aden) offers testable facts. India's maritime security operations are plausible Prelims hooks but appear occasionally, not frequently.

  • Union Minister Shri Shivraj Singh Chouhan launches nationwide ‘Viksit Bharat – G-Ram G Act’ from Andhra Pradesh with Chief Minister Shri Chandrababu Naidu and Deputy Chief Minister Shri Pawan Kalyan

    A newly named nationwide scheme launched by the Rural Development ministry that explicitly positions itself as moving 'beyond MGNREGA' is potentially testable. However, the excerpt lacks concrete numbers or statutory grounding, keeping it at 3 rather than 4.

  • MANAS: A Digital Shield Against Drugs

    MANAS is a named government digital initiative (national narcotics helpline) with a specific mandate under Nasha Mukt Bharat. Named government portals/helplines with specific functions are tested in Prelims, though this release is a backgrounder without new launch data.

  • VB-G RAM G Act comes into force across the country from today; “A historic day for rural India”: Shivraj Singh Chouhan

    The VB-G RAM G Act (likely a renamed/revised MGNREGA or rural employment guarantee framework) came into force across India from July 1, 2026. Key facts: national launch in Tirupati on July 2; revised wage rates notified with no daily wage below ₹300; national average wage increased by over 10%. A new central Act coming into force with specific wage figures is high-priority Prelims material.

  • India Achieves Major Milestone with Approval of Country’s First PinS Instrument Approach Procedure for Helicopter Operations

    DGCA approved India's first Private Point-in-Space (PinS) Instrument Approach Procedure for helicopter operations, implemented at Undavalli Heliport (developed by AAI). This is a named first in Indian aviation with a specific location and implementing body — classic Prelims material for science/tech and aviation sections.

  • 11 Years of Digital India: Better Healthcare & Digital Markets Making Lives Easier

    This release contains high-quality testable data: Greece is named as the 10th country to adopt UPI; every second real-time digital transaction globally is processed via India's UPI; 13 lakh Anganwadi workers connected via Poshan Tracker covering 9 crore beneficiaries. Multiple concrete facts that are prime Prelims material.

  • India, EU Advance Cooperation on Sustainable Ship Recycling; Three Indian Yards Ready for EU Recognition

    India has a 35.4% global market share in sustainable ship recycling. Three Indian ship-recycling yards are ready for EU recognition. India committed $8 billion to strengthen shipbuilding and recycling, with a target of recycling 16,000 ships. These are specific, verifiable figures in a sector where India leads globally — strong Prelims material on maritime/shipping sector.

  • GAGAN: Navigating India’s Skies with Precision

    Detailed backgrounder on GAGAN (GPS Aided GEO Augmented Navigation), India's Satellite-Based Augmentation System developed jointly by ISRO and Airports Authority of India (AAI). It enhances GPS accuracy for aviation, is certified to international standards, and supports satellite-based landing approaches. GAGAN is a recurring Prelims topic and this backgrounder consolidates key testable facts about its developers, purpose, and certification status.

  • The Hindu

    Latest PIB

    Latest from The Hindu

    Explore