Shubham's blog

Building a Voice-First Hindi Tutor: Technical Lessons from Helping Diaspora Kids Talk to Their Grandparents

TL;DR: My nephew in Singapore was losing his Hindi. Video calls with Dadi were getting awkward. I built an AI tutor so diaspora kids (ages 5-10) can practice real conversations daily. Went through 3 LLM providers, 4 STT experiments, and learned that building for children is 10x harder than adults. Here's the technical journey.


It Started at a Family Dinner

Singapore, 2024. My nephew, surrounded by Chinese at school, was struggling to talk to his grandmother on video calls. The typical pattern: wave awkwardly, say "नमस्ते Dadi," then silence. His mom translating everything.

The gap was obvious: Traditional apps teach vocabulary. Video calls happen weekly at best. But conversation—real, natural conversation—needs daily practice with a patient companion who lets him talk about dinosaurs and cartoons in Hindi without judgment.

So between jobs, I spent two weeks building what he needed. Naive as I was, I thought it would be straightforward.

It wasn't.

This is the technical story of building a voice-first Hindi conversation tutor for diaspora kids trying to hold onto their heritage language.


The Real Problem: Not Just Latency, But Context

There are dozens of storytelling apps in Hindi. Plenty of reading apps like Google's Read Along and Kutuki. But zero conversational practice apps that actually work for kids speaking Hindi in a diaspora context.

Why is diaspora different?

Code-switching is the norm:

"मैं school जा रहा हूं" (I'm going to school) "मुझे dinosaurs बहुत पसंद हैं" (I really like dinosaurs)

Cultural vocabulary matters:

English is dominant:

Building for this context meant rethinking everything about the standard STT → LLM → TTS pipeline.


Architecture Evolution: 3 LLM Providers, 4 STT Attempts

The Naive Start (August 2024)

Elevenlabs STT → OpenAI GPT-4 → Elevenlabs TTS

Hypothesis: Use the best-in-class for each component.

Reality:

Lesson 1: "Best-in-class" for English ≠ best-in-class for Indic languages.

The Hindi-Specialized Stack (September 2024)

Sarvam STT → OpenAI GPT-4 → Elevenlabs TTS

Why Sarvam? Indian startup specializing in Hindi ASR. Better at children's speech patterns and understanding context.

Why keep Elevenlabs TTS? Tried Sarvam's TTS—robotic, 1s+ latency. For kids, voice quality is non-negotiable. They need a warm, engaging voice that feels like talking to a real person, not a robot.

Result: Better accuracy, still too slow (3-4s latency).

The Speed Obsession (September 2024)

Sarvam STT → Groq Llama 3.1 8B → Elevenlabs TTS

Breakthrough insight: For "talk to Dadi about your day" conversations, GPT-4 is overkill.

Optimizations:

  1. Smaller, faster model (Llama 3.1 8B)
  2. Parallel API calls (evaluation + response generation)
  3. Cost dropped 10x
# Sequential execution (SLOW)
transcription = await sarvam_stt(audio)      # 800ms
evaluation = await evaluate(transcription)    # 1200ms  
response = await generate_response(...)       # 1500ms
audio = await elevenlabs_tts(response)       # 800ms
# Total: 4300ms

# Parallel execution (FAST)
transcription = await sarvam_stt(audio)      # 800ms

# These run simultaneously
results = await asyncio.gather(
    evaluate(transcription),                  # 1200ms
    generate_response(...),                   # 1500ms
)
audio = await elevenlabs_tts(response)       # 800ms
# Total: 3100ms (28% faster)

Result: 44% latency reduction. Kids stayed engaged.

Trade-off: Slightly less nuanced responses, but 5-year-olds didn't notice.

Current Stack (December 2025 - Present)

Sarvam STT → Google Gemini 2.0 Flash Lite → Elevenlabs TTS

Why switch from Groq to Gemini?

The winning pipeline:

with ThreadPoolExecutor(max_workers=2) as executor:
    eval_future = executor.submit(
        evaluate_response,  # Grammar + context check
        user_text, tutor_question
    )
    conv_future = executor.submit(
        generate_response,  # Actual conversation
        conversation_history, child_name
    )
    
    evaluation = eval_future.result()
    conversation = conv_future.result()

This parallel execution saves 1-2 seconds per turn. For kids, that's the difference between "this is fun!" and "I'm bored."


The STT Nightmare: Why Indic Languages Are Still Hard

Experiment 1: Chromium Native ASR 🚫

Hypothesis: Use browser's built-in speech recognition. Zero API costs, instant feedback with live transcription.

Reality with kids learning Hindi:

Live transcription updates as the model corrects itself:

First pass: "मैं school जा रहा"
Second pass: "मैं स्कूल..."
Final: "मैं school जा रहा हूं"

Kids thought the first transcription was correct. They'd stop speaking, confused why it was changing.

Verdict: Optimizing for perceived latency ≠ optimizing for learning clarity.

Experiment 2: Whisper (The Hype vs Reality) 🚫

Everyone said "just use Whisper for multilingual ASR."

For diaspora kids speaking Hindi:

Example failure: Child says: "मैं आज अपने friend के साथ park गया" Whisper: "main aaj apne friend ke saath park gaya" (all Roman script) Needed: "मैं आज अपने friend के साथ park गया" (mixed script)

Lesson 2: Cutting-edge ≠ good for your specific use case.

Experiment 3: Google Cloud STT (The Dark Horse) ✅

I almost skipped it. Google doesn't market it aggressively. Seemed old-school compared to Whisper.

Surprise: It was perfect for diaspora kids.

Why it worked:

Example: Child says (with 2s pause mid-sentence): "मुझे... cricket खेलना बहुत पसंद है" Google STT: Correctly transcribes with pause intact Whisper: Often truncates or misses second half

Trade-off: Doesn't correct pronunciation (but that was never the goal—conversation practice is the goal).

Current Setup: Google Cloud Primary, Sarvam Fallback

def transcribe_audio(audio_bytes, provider='sarvam'):
    try:
        if provider == 'google':
            # Primary: Google (context-aware + decent in Hindi)
            result = sarvam_api.transcribe(audio_bytes)
            return result
    except Exception as e:
        logger.warning(f"Google Cloud STT failed: {e}")
        # Fallback: Sarvam
        return sarvam.transcribe(audio_bytes)

Why dual-provider?


Voice UX: Designing for Heritage Language Learners

Challenge: These Aren't Native Speakers

Most voice apps assume fluent users. Diaspora kids:

Decision 1: Manual Recording > Voice Activity Detection

Conventional wisdom: Use VAD for seamless conversation.

Reality with 5-10 year olds:

Our solution: Big, obvious record button with visual feedback.

// Simple is better for kids
recordButton.addEventListener('click', () => {
    if (isRecording) {
        stopRecording();
        processingIndicator.show();
    } else {
        startRecording();
        animatedMic.start();
    }
});

Visual cues matter:

Result: Kids understand the turn-taking model. Zero confusion.

Decision 2: The 10-Sentence Conversation Structure

Problem discovered during testing: Kids got tired after 15+ exchanges. Conversations dragged. They'd leave mid-conversation.

Goal: Keep them wanting more, not exhausted.

Solution: Structured phases with automatic wrap-up:

def get_phase_instruction(sentences_count, is_farewell=False):
    if is_farewell:
        return "Give warm goodbye with homework for parents"
    
    if sentences_count >= 10:
        return "Wrap up conversation with encouragement"
    
    if sentences_count == 9:
        return "Start transitioning to conclusion"
    
    return ""  # Continue naturally

The conversation arc:

Why this works:

Farewell handling: If kid says "bye" or "अलविदा" at sentence 5:

if is_farewell:
    return immediate_warm_goodbye()  # Don't force them to continue

Decision 3: Streaming with Typewriter Effect

Old approach: Wait for full response → display all at once → play audio

Better approach: Stream text as it generates, audio comes after

// Text appears immediately, audio later
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
    const {done, value} = await reader.read();
    if (done) break;
    
    const text = decoder.decode(value);
    displayTextTypewriter(text);  // Char-by-char animation
}

// Audio completes in background
await playAudio(audioBytes);

Why it works:

Actual latency breakdown:

User stops speaking
↓
500ms: Processing audio
↓
800ms: STT transcription
↓
0ms: Start streaming text (parallel LLM + TTS)
↓
700ms: LLM generates full response
↓
600ms: TTS completes audio
↓
Total: 2.6s but feels like 1.3s due to streaming

The Latency Battle: India + Diaspora Reality

The Mystery: Why Is Production 3x Slower?

Local testing (MacBook, Bangalore): 1.5-2s latency ✅
Production (Heroku, USA): 4-6s latency 🚫

Initial reaction: "Is Heroku throttling me?"

Actual root causes:

1. Geographic API distribution

Sarvam API: Mumbai servers
OpenAI API: US West servers  
ElevenLabs API: US East servers

Round-trip time for API chain:
- Bangalore → Mumbai: 30ms
- Mumbai → US West: 200ms
- US West → US East: 70ms
- US East → User: variable

Production total: +400ms just in network hops

2. Heroku cold starts (free tier pain)

First request after 30min inactivity: 8-10s
Subsequent requests: 4-6s

3. Sequential API calls in initial architecture

Solutions Implemented

Optimization 1: Parallel API execution

# Impact: -28% latency

# Before
transcription = await stt(audio)          # 800ms
evaluation = await evaluate(text)         # 1200ms
response = await generate(context)        # 1500ms
audio = await tts(response)               # 800ms
# Total: 4300ms

# After
transcription = await stt(audio)          # 800ms

# Run simultaneously
eval, response = await asyncio.gather(
    evaluate(text),                       # 1200ms
    generate(context)                     # 1500ms
)
# (takes 1500ms, not 2700ms)

audio = await tts(response)               # 800ms
# Total: 3100ms

Optimization 2: Keep Heroku warm

# Ping every 25 minutes
@app.route('/health')
def health_check():
    return {'status': 'healthy', 'timestamp': time.time()}

# External service: UptimeRobot pings /health every 5min

Optimization 3: Redis session caching

# Before: DB hit every request (100ms)
# After: Redis hit (5ms)

def get_session_store():
    if redis_url:
        try:
            return RedisSessionStore(redis_url)
        except:
            return FileSessionStore()  # Fallback
    return FileSessionStore()

The Provider Speed Race

We tried every major LLM provider:

Provider Model Latency Quality Cost/1M Verdict
OpenAI GPT-4 1500ms Excellent $15 Slow + expensive
OpenAI GPT-4o-mini 800ms Very Good $0.15 Good but costly
Groq Llama 3.1 8B 600ms Good $0.05 Fast!
Groq Llama 3.3 70B 900ms Excellent $0.59 Quality jump
Google Gemini 2.0 Flash 700ms Excellent $0.075 Winner

Why Gemini won:

Lesson 3: For kid-focused conversations in Hindi, model quality matters less than you think. Speed + cultural context matters more.


UI/UX: Why Building for Kids Is 10x Harder

The "This Is Boring" Problem

First version was built like a normal app:

Kids' reaction: 😐 Left within 2 minutes.

The Airplane Insight

Flying Singapore → Bangalore, I watched the in-flight entertainment system. When you switch to "Kids Mode":

Adult mode: Clean, text-heavy, functional
Kids mode:

Aha moment: Kids need constant positive reinforcement or they disengage.

What Actually Works for 5-10 Year Olds

1. Visual reward system (The Star Economy)

// Points for every interaction
const POINTS = {
    sentence: 10,           // Every sentence in Hindi
    quality_bonus: 20,      // Every 5 good responses
    completion: 50,         // Finish conversation
    milestone: 30           // Special achievements
};

// Update with animation
function updateRewards(points) {
    animateNumberChange(starsElement, currentPoints, newPoints);
    playSound('chime.mp3');
    
    // Milestone celebration
    if (goodResponses % 5 === 0) {
        showCelebration();  // Confetti + special animation
        playSound('applause.mp3');
    }
}

Why it works: Kids are motivated by immediate, visual progress. Stars accumulate visibly. They count them proudly.

2. Positive-only feedback (Never punish)

def get_feedback_type(grammar_score, context_score):
    if grammar_score >= 7 and context_score >= 7:
        return "green"  # "Great Hindi!" ✅
    
    if grammar_score >= 5 or context_score >= 5:
        return "amber"  # "Try saying..." 🔄
    
    # NEVER red/negative
    return "amber"  # Always give them a path forward

Green bubble: "बहुत अच्छा! That was great Hindi!"
Amber bubble: "Good! You can also say: [correction]"
Red bubble: ❌ Never. Kids shut down.

3. Minimal header during learning

Before (distracting):

[App Title] [Dashboard] [Nav] [Sentences: 5] [Points: 30] [Profile ▼]

After (focused):

[← Back]                              [⭐ 47] [🐶]

Result: 60% longer session times. Kids stayed in the conversation.

4. Fun animal avatars (kid-tested)

Instead of initials or photos:

const avatars = ['🐶', '🐱', '🐼', '🦊', '🐨', '🐯'];

// Rotate through on each visit
const avatar = avatars[sessionCount % avatars.length];

Kids love their animal identity. They ask "which animal am I today?"

5. Celebration animations at milestones

// Every 5 good responses
if (goodResponseCount % 5 === 0) {
    showFullScreenCelebration({
        confetti: true,
        message: "Amazing! 5 great sentences!",
        sound: 'applause.mp3',
        stars: +20
    });
}

Before celebration system: 4-minute average session
After celebration system: 12-minute average session

Kids kept talking to hit the next milestone.


The Grammar vs Context Challenge

Why Standard Grammar Checking Fails

Naive approach:

def evaluate(user_response):
    prompt = "Rate this Hindi sentence grammar (1-10): " + user_response
    return llm(prompt)

Problem: No context means wrong evaluations.

Example:

Tutor: "तुम्हें कौन सा खाना पसंद है?" (What food do you like?)
Kid: "हाँ" (Yes)
Grammar check: 10/10 ✅
Context check: ❌ Completely wrong answer

Better: Contextual Evaluation

def evaluate_response(user_response, tutor_question, conversation_history):
    prompt = f"""
    You're evaluating a child (age 5-10) learning Hindi.
    
    Tutor asked: "{tutor_question}"
    Child said: "{user_response}"
    Previous context: {conversation_history[-3:]}
    
    Evaluate:
    1. Grammar (1-10): Are sentences grammatically correct?
    2. Context (1-10): Does response make sense for the question?
    3. Code-switching: Count English words used
    4. Encouragement: What to say to motivate them?
    5. Correction: If needed, suggest better phrasing
    
    Return JSON:
    {{
        "grammar_score": int,
        "context_score": int,
        "english_word_count": int,
        "feedback_type": "green" | "amber",
        "encouragement": str,
        "corrected_response": str | null
    }}
    """
    
    return gemini_json_mode(prompt)

Better evaluation:

{
    "grammar_score": 8,
    "context_score": 4,
    "english_word_count": 0,
    "feedback_type": "amber",
    "encouragement": "Good try! But let's answer the question.",
    "corrected_response": "मुझे पिज़्ज़ा पसंद है" (I like pizza)
}

The Code-Switching Dilemma

Diaspora kids naturally code-switch:

"मैं school जा रहा हूं और मेरे friends के साथ lunch खाऊंगा" (I'm going to school and will eat lunch with my friends)

Design decision we debated:

  1. Strict: Mark as incorrect, force Hindi-only
  2. Lenient: Accept it completely
  3. Guiding: Accept but gently suggest alternatives

We chose #3 (guiding):

def handle_code_switching(english_word_count, user_response):
    if english_word_count == 0:
        return {
            "feedback": "बहुत अच्छा! Pure Hindi!",
            "type": "green",
            "bonus_points": 5
        }
    
    elif english_word_count <= 2:
        return {
            "feedback": "Good! Next time try: 'स्कूल' instead of 'school'",
            "type": "green",  # Still positive
            "suggestion": get_hindi_alternatives(user_response)
        }
    
    else:  # 3+ English words
        return {
            "feedback": "Let's try that again in more Hindi",
            "type": "amber",
            "corrected_response": translate_to_hindi(user_response)
        }

Why this works:

Real example:

Kid: "मैं park में जाकर मेरे dost के साथ football खेला"

Response: "Great sentence! ⭐ You can also say:
'मैं पार्क में जाकर मेरे दोस्त के साथ फुटबॉल खेला'
Next time try using Hindi words for 'park', 'dost', and 'football'!"

Type: Green (still rewarded)
Points: +10

Gamification: What Works vs What Doesn't

❌ What Didn't Work

1. Story Co-Creation

Initial idea: "Let's create a Panchatantra story together!"

Problems:

Example failure:

Tutor: "एक कौवा प्यासा था। अब क्या हुआ?" (A crow was thirsty. What happened next?)
Kid: "...um... I don't know... पानी?" (water?)
Tutor: "हाँ! कहाँ पानी मिला?" (Yes! Where did he find water?)
Kid: [long pause] "I don't know" [exits]

2. Free-Form Conversation

Initial idea: "Talk about whatever you want!"

Problems:

3. Correction Pop-ups (Too Harsh)

Initial design: Immediately show corrections after mistakes

Problem: Felt like a test, not practice. Kids became self-conscious.

✅ What Actually Works

1. Structured conversation types

Each topic has clear scope and goals:

"My Family" (मेरा परिवार):

"Food Talk" (खाने की बातें):

Why structured works:

2. Milestone-based rewards

REWARD_STRUCTURE = {
    'sentence': 10,              # Base points
    'quality_milestone': 20,     # Every 5 good responses
    'completion': 50,            # Finish conversation
    'streak': 30,                # Multiple days in a row
}

def calculate_reward(metrics):
    points = REWARD_STRUCTURE['sentence']
    
    if metrics['good_response_count'] % 5 == 0:
        points += REWARD_STRUCTURE['quality_milestone']
        trigger_celebration()  # Visual reward
    
    return points

Result: Kids chase milestones ("just 2 more sentences till the celebration!")

3. Parent analytics (hidden motivator)

Dashboard shows parents:

Why it matters: Parents encourage kids when they see progress. Social proof works even for kids.

Example parent reaction:

"Beta, you had 4 conversations this week! That's amazing! And look, 70% were perfect Hindi sentences!"


Technical Decisions Worth Discussing

HTTP + SSE > WebSockets (For Turn-Taking)

Everyone: "Voice apps need WebSockets for real-time!"

Our reasoning: Kids do turn-taking, not simultaneous conversation.

# Simple HTTP endpoint
@app.route('/api/process_audio_stream', methods=['POST'])
def process_audio():
    audio = request.files['audio']
    
    def generate_response():
        # Streaming via SSE
        for chunk in stream_conversation(audio):
            yield f"data: {json.dumps(chunk)}\n\n"
    
    return Response(
        generate_response(),
        mimetype='text/event-stream'
    )

Benefits:

Trade-off: Can't do simultaneous input/output (but we don't need it)

Dual-Layer Session Management

def get_session_store():
    """
    Production: Redis (24hr TTL)
    Fallback: FileSessionStore
    Persistence: PostgreSQL (forever)
    """
    if redis_url:
        try:
            store = RedisSessionStore(redis_url)
            store.redis.ping()
            return store
        except:
            logger.warning("Redis failed, using FileStore")
            return FileSessionStore()
    return FileSessionStore()

Why three layers?

Layer Purpose TTL Use Case
Redis Fast access 24 hours Active conversation state
FileStore Dev + fallback 24 hours Local dev, Redis failure
PostgreSQL History Forever Parent dashboard, analytics

Principle: Resilience > performance optimization. If Redis fails at 2am, conversations continue with FileStore.

SQLite → PostgreSQL (Ship Fast, Scale Later)

# Works in both dev and prod
database_url = os.getenv(
    'DATABASE_URL',
    'sqlite:///hindi_tutor.db'  # Dev fallback
)

# Heroku gives postgres:// URL, SQLAlchemy needs postgresql://
if database_url.startswith('postgres://'):
    database_url = database_url.replace('postgres://', 'postgresql://', 1)

Development: SQLite (zero setup, file-based)
Production: PostgreSQL (Heroku managed)

Same ORM, different backend. Ship fast locally, scale when needed.


The iOS Audio Hell

This deserves its own post.

The Problem: Safari Audio Just Doesn't Work

Issues discovered:

  1. First audio playback: Silent (AudioContext requires user gesture)
  2. Volume controls: Don't work (Web Audio API needed)
  3. Recording: Random lag (WebKit MediaRecorder quirks)
  4. OAuth redirects: Break audio state entirely

Standard audio approach:

// This FAILS on iOS
const audio = new Audio(audioUrl);
audio.play();  // Silent on first try

The Solution (After Pulling Hair Out)

Step 1: Unlock AudioContext on ANY user interaction

let audioContext = null;
let audioUnlocked = false;

function unlockAudio() {
    if (audioUnlocked) return;
    
    audioContext = new (window.AudioContext || window.webkitAudioContext)();
    
    // Create dummy sound to "unlock" audio
    audioContext.resume().then(() => {
        const buffer = audioContext.createBuffer(1, 1, 22050);
        const source = audioContext.createBufferSource();
        source.buffer = buffer;
        source.connect(audioContext.destination);
        source.start(0);
        
        audioUnlocked = true;
        console.log('iOS audio unlocked');
    });
}

// Critical: Call on FIRST user interaction
document.addEventListener('click', unlockAudio, { once: true });
document.addEventListener('touchstart', unlockAudio, { once: true });

Step 2: Use Web Audio API for all playback

async function playAudioiOS(audioData) {
    if (!audioUnlocked) {
        console.error('Audio not unlocked yet');
        return;
    }
    
    // Decode audio data
    const arrayBuffer = await audioData.arrayBuffer();
    const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
    
    // Create source
    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    
    // Volume control (finally works!)
    const gainNode = audioContext.createGain();
    gainNode.gain.value = volumeSlider.value;
    
    // Connect: source → gain → destination
    source.connect(gainNode);
    gainNode.connect(audioContext.destination);
    
    // Play
    source.start(0);
}

Step 3: Handle OAuth redirects

// Before OAuth redirect, save audio state
function preserveAudioState() {
    sessionStorage.setItem('audioUnlocked', audioUnlocked);
    sessionStorage.setItem('volume', volumeSlider.value);
}

// After OAuth callback, restore
function restoreAudioState() {
    audioUnlocked = sessionStorage.getItem('audioUnlocked') === 'true';
    if (audioUnlocked) {
        unlockAudio();
    }
}

Lesson 4: Mobile web audio in 2025 is still a mess. Test on actual iPhones, not just simulators.


Measuring Success: Metrics That Matter

Latency Breakdown (Current)

User stops speaking
↓
Audio processing: 50ms
↓
STT (Google Cloud): 800ms
↓
[Parallel execution starts]
├─ Evaluation (Gemini): 600ms
└─ Response (Gemini): 400ms
↓
TTS (Elevenlabs): 300ms
↓
Network overhead: 200ms
↓
Total: ~1.7 seconds ✅

Target: < 2 seconds (kids stay engaged)
Achieved: 1.7s average, 1.0s best case


Key Lessons for Voice AI Builders

1. Don't Trust English-Optimized Benchmarks

Takeaway: Benchmark on YOUR use case, not general leaderboards.

2. Optimize for Perceived Latency

Actual latency: 2.5s
Perceived latency: 1.3s (via streaming text)

Techniques:

3. Kids Are a Different Species

What works for adults:

What works for kids:

4. For Indic Languages, Go Specialized

Winners:

Losers:

5. Voice UX ≠ Chat UX

Voice-specific needs:


What's Next for the Product

Short-term (Next 3 Months)

1. Age-specific models

2. Phoneme practice module

3. Parent dashboard v2

Long-term Vision

1. Sibling mode

2. Heritage language expansion

3. Cultural curriculum

Goal: Become the default tool for diaspora families keeping heritage languages alive.


Open Questions (Still Figuring Out)

1. How do we measure actual learning vs engagement?

2. Optimal conversation length by age?

3. Should we enforce Hindi-only or accept code-switching?

4. How to prevent reward gaming?


Overall summary

Time invested: 4 months of focused work
Lines of code: ~15,000
Git commits: 220+
LLM providers tried: 3 (OpenAI, Groq, Gemini)
STT providers tried: 4 (Elevenlabs, Whisper, Sarvam, Google)
UI redesigns: 6 major iterations

Biggest lesson:

Building for kids is humbling. They're brutally honest users—if it's boring, they leave. If it's too slow, they get frustrated. If it doesn't work, they try once and never return.

But when a 6-year-old says "मुझे यह बहुत पसंद है!" (I really like this!) after finishing a conversation?

When a parent messages: "She asked to practice before calling Dadi. First time ever."

When you see a family preserving their heritage language across continents?

Worth every commit. Worth every latency optimization. Worth every iOS audio bug.


Join the Journey

We're currently building with 100 founding families for free early access. If you're a diaspora parent trying to keep Hindi alive, we'd love your feedback.

Built with:

Open questions for the community:

Building something in voice AI or heritage language tech? Let's chat. This space needs more builders.


P.S. If you're building for kids, test with real kids early. They will humble you in ways adults never will.