AI News HubLIVE
站内改写

Built an AI that explains math visually instead of just answering

Claw Learn is an AI-powered visual math tutor that combines the ElevenLabs Speech Engine with a custom canvas renderer to turn math questions into live animated explanations with synchronized narration. Users can ask questions by voice or text and watch the animation generate in real-time.

Article intelligence

EngineersIntermediate

Key points

  • Claw Learn transforms math questions into visual animated explanations with real-time voice interaction. The project is built on Next.js 16 and uses ElevenLabs WebRTC for low-latency voice I/O.
  • Supports multiple AI providers (Gemini, OpenAI, Ollama) and offers detailed deployment guides.
  • The canvas renderer supports over 30 visual element types to dynamically generate custom teaching scenes.

Why it matters

This matters because claw Learn transforms math questions into visual animated explanations with real-time voice interaction. The project is built on Next.js 16 and uses ElevenLabs WebRTC for low-latency voice I/O.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

Notifications You must be signed in to change notification settings

Fork 1

Star 1

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

21 Commits

21 Commits

app

app

components

components

docs

docs

hooks

hooks

lib

lib

public

public

types

types

.env.local.example

.env.local.example

.gitignore

.gitignore

AGENTS.md

AGENTS.md

CLAUDE.md

CLAUDE.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

SECURITY.md

SECURITY.md

auth.ts

auth.ts

eslint.config.mjs

eslint.config.mjs

next.config.ts

next.config.ts

package-lock.json

package-lock.json

package.json

package.json

postcss.config.mjs

postcss.config.mjs

proxy.ts

proxy.ts

tsconfig.json

tsconfig.json

vercel.json

vercel.json

Repository files navigation

Talk to it. Watch it teach.

Claw Learn is an AI-powered visual math tutor with a real-time voice interface — powered by the ElevenLabs Speech Engine. Ask any math or physics question by voice or text, and watch a synchronized animated explanation generate live in the browser.

Live Demo · Report a Bug · Request a Feature

What is Claw Learn?

Claw Learn combines the ElevenLabs Speech Engine with an AI scene planner and a custom canvas renderer to turn math questions into live animated explanations with synchronized narration.

The Speech Engine is the core of the experience — it handles both voice input and audio output over WebRTC, so you can speak your question, interrupt mid-explanation, and ask follow-ups without ever touching a keyboard. When the Speech Engine isn't configured, the app falls back to REST TTS and browser-based speech recognition.

No slides. No textbooks. No pre-recorded videos. Every explanation is generated fresh for your exact question.

You: "Why does the derivative represent slope?"

App: → ElevenLabs Speech Engine captures your voice over WebRTC → AI generates a 10-scene visual teaching plan → Canvas renders: axes, parabola, tangent line, slope formula → Speech Engine narrates each scene in sync with the animation → Interrupt at any time to ask a follow-up — just speak

Demo

Add a GIF or screenshot here

Try these questions:

"How does matrix multiplication work?"

"Explain the Fourier transform visually"

"What is integration and why does it find area?"

"Show me Euler's formula e^(iπ) + 1 = 0"

"How does gravity create orbits?"

Tech Stack

Layer Technology

Framework Next.js 16 (App Router, Turbopack)

UI React 19, Tailwind CSS v4, Framer Motion

AI Any OpenAI-compatible API (Gemini, OpenAI, Ollama, etc.)

Voice I/O ElevenLabs Speech Engine (WebRTC)

TTS fallback ElevenLabs REST API

STT fallback Web Speech API

Animations Custom 2D Canvas renderer

Language TypeScript 5

Deployment Vercel

Getting Started

Prerequisites

Node.js 18+

An OpenAI-compatible API key — Gemini (free at aistudio.google.com), OpenAI, or any compatible provider

Google OAuth credentials — required for login (console.cloud.google.com)

Upstash Redis — recommended for rate limiting (console.upstash.com, free tier)

ElevenLabs — optional, free tier at elevenlabs.io

  1. Clone and install

git clone https://github.com/arzumanabbasov/claw-learn.git cd claw-learn npm install

  1. Configure environment variables

cp .env.local.example .env.local

Open .env.local and fill in your keys:

── AI Provider (required) ────────────────────────────────────────────────────

OPENAI_API_KEY=your_api_key_here OPENAI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai OPENAI_MODEL=gemini-2.5-flash

── Auth (required) ───────────────────────────────────────────────────────────

Generate a secret: openssl rand -base64 32

AUTH_SECRET=your_auth_secret_here

Google OAuth — https://console.cloud.google.com/

GOOGLE_CLIENT_ID=your_google_client_id GOOGLE_CLIENT_SECRET=your_google_client_secret

── Rate limiting — Upstash Redis (recommended) ───────────────────────────────

Without these, rate limiting falls back to in-memory (resets on server restart)

Create a free Redis DB at https://console.upstash.com/

UPSTASH_REDIS_REST_URL=https://your-db.upstash.io UPSTASH_REDIS_REST_TOKEN=your_token_here

── ElevenLabs Voice (optional) ───────────────────────────────────────────────

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgB

Speech Engine — full WebRTC voice I/O

Create an agent at https://elevenlabs.io/app/conversational-ai

ELEVENLABS_SPEECH_ENGINE_ID=agent_xxxxxxxxxxxxxxxxxxxx

── Security ──────────────────────────────────────────────────────────────────

ALLOWED_ORIGIN=https://your-domain.com

  1. Run

npm run dev

Open http://localhost:3000.

Authentication

Claw Learn uses NextAuth.js v5 with Google OAuth. All routes require a valid session — unauthenticated users are redirected to /login.

Setup:

Go to console.cloud.google.com → APIs & Services → Credentials

Create an OAuth 2.0 Client ID (Web application)

Add your domain to Authorized JavaScript origins and https://your-domain.com/api/auth/callback/google to Authorized redirect URIs

Copy the Client ID and Secret into your env vars

Generate AUTH_SECRET with openssl rand -base64 32

For local dev, add http://localhost:3000 as an authorized origin and http://localhost:3000/api/auth/callback/google as a redirect URI.

Rate Limiting

Each authenticated user gets 3 questions per day, tracked by their Google user ID and reset at UTC midnight.

Rate limiting uses Upstash Redis in production — an atomic INCR with a TTL set to the end of the current UTC day. This is serverless-safe and works across all Vercel edge instances.

Without Upstash credentials, the app falls back to an in-memory store that resets whenever the server restarts (fine for local dev, not suitable for production).

Setup:

Create a free Redis database at console.upstash.com

Copy the REST URL and token into UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN

The remaining question count is shown in the top bar as a live badge and resets automatically each day.

Claw Learn uses the OpenAI-compatible API format, so it works with any provider that supports it.

Gemini (default)

OPENAI_API_KEY=your_gemini_api_key OPENAI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai OPENAI_MODEL=gemini-2.5-flash

OpenAI

OPENAI_API_KEY=your_openai_api_key OPENAI_BASE_URL=https://api.openai.com/v1 OPENAI_MODEL=gpt-4o

Ollama (local)

OPENAI_API_KEY=ollama OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_MODEL=llama3.1

Voice Modes

Mode 1 — ElevenLabs Speech Engine (recommended)

The Speech Engine connects via WebRTC to an ElevenLabs Conversational AI agent and is the primary voice interface for Claw Learn. It handles both input and output in a single low-latency connection:

Voice input — speak your questions naturally, no typing needed

Streaming TTS — audio streams directly from ElevenLabs as each scene plays

Interruption — speak mid-explanation to redirect or ask a follow-up

Lower latency — WebRTC is significantly faster than the REST fallback

Setup:

Go to elevenlabs.io/app/conversational-ai

Create a new agent

Set the system prompt to: "You are a math narration voice. Read exactly what the user sends you as clear, natural narration."

Copy the Agent ID and set it as ELEVENLABS_SPEECH_ENGINE_ID in your .env.local

The Voice button in the top bar connects and disconnects the Speech Engine. When connected, a pulsing green indicator shows the session is live.

Mode 2 — REST TTS fallback

When ELEVENLABS_API_KEY is set but no Speech Engine is configured, each scene's narration is sent to the ElevenLabs REST API and played back as audio. No voice input in this mode.

Mode 3 — No voice

The app works fully without any ElevenLabs configuration — text input and silent animations only.

Deployment

Vercel (recommended)

npx vercel

Set these environment variables in the Vercel dashboard under Settings → Environment Variables:

Variable Required Description

OPENAI_API_KEY ✅ API key for your AI provider

OPENAI_BASE_URL ✅ Base URL of the OpenAI-compatible endpoint

OPENAI_MODEL ✅ Model name to use

AUTH_SECRET ✅ NextAuth secret (openssl rand -base64 32)

GOOGLE_CLIENT_ID ✅ Google OAuth client ID

GOOGLE_CLIENT_SECRET ✅ Google OAuth client secret

UPSTASH_REDIS_REST_URL Recommended Upstash Redis URL for persistent rate limiting

UPSTASH_REDIS_REST_TOKEN Recommended Upstash Redis token

ELEVENLABS_API_KEY Optional ElevenLabs REST TTS fallback

ELEVENLABS_VOICE_ID Optional Override default voice

ELEVENLABS_SPEECH_ENGINE_ID Recommended WebRTC voice agent ID

ALLOWED_ORIGIN Recommended Your production domain for CORS

The vercel.json in the repo is pre-configured.

Self-hosted

npm run build npm start

Requires Node.js 18+ and the environment variables above.

Visual Element Reference

The canvas renderer supports 30+ element types:

Type Description

axes Coordinate axes with grid and tick labels

graph Function curve (JS math expression)

tangent Tangent line to a curve at a point

secant Secant line between two points

shaded_area Filled area under a curve

point Dot with optional label

vector Arrow with label

matrix Matrix grid with brackets and highlights

formula Math text in a pill box

histogram Bar chart for distributions

pie_chart Proportions and compositions

bar_chart Categorical comparisons

line_chart Discrete data series

scatter_plot Correlation with optional regression line

wave Propagating sine/cosine wave

axes_3d Isometric 3D axes

complex_plane Re/Im axes with unit circle

riemann_sum Rectangles approximating an integral

slope_field Directional arrows for dy/dx

parametric_curve x(t), y(t) traced as t varies

polygon Arbitrary shape from vertices

angle_arc Label an angle between two rays

spring Physics spring between two points

brace Curly brace annotation

table Data table with headers

highlight_region Shaded overlay

Coordinate system: origin at center, x right, y up. Typical visible range: x ∈ [-6, 6], y ∈ [-4, 4].

Project Structure

clawlearn/ ├── app/ │ ├── api/ │ │ ├── explain/route.ts # POST — AI scene plan generation │ │ ├── narrate/route.ts # POST — ElevenLabs REST TTS │ │ └── speech-engine/token/ # GET — WebRTC conversation token │ ├── page.tsx # Root — landing ↔ tutor router │ ├── layout.tsx # Fonts, metadata, global CSS │ └── globals.css # Design tokens, animations │ ├── components/ │ ├── LandingPage.tsx # Marketing page │ ├── TutorApp.tsx # App shell │ ├── AnimationCanvas.tsx # Canvas + scene sequencer │ ├── ConversationPanel.tsx # Chat history │ ├── QuestionInput.tsx # Input bar │ └── NarrationSubtitle.tsx # Subtitle below canvas │ ├── hooks/ │ ├── useTutor.ts # Core orchestration │ ├── useSpeechEngine.ts # ElevenLabs Speech Engine (WebRTC) │ └── useVoice.ts # Web Speech API fallback │ ├── lib/ │ ├── openai.ts # OpenAI-compatible client + system prompt │ ├── animationEngine.ts # Canvas renderer (30+ elements) │ ├── elevenlabs.ts # ElevenLabs REST helpers │ └── voiceRecognition.ts # Web Speech API wrapper │ ├── types/ │ └── scene.ts # Scene plan TypeScript types │ ├── .env.local.example # Environment variable template ├── CONTRIBUTING.md # Contribution guide ├── LICENSE # MIT └── vercel.json # Vercel deployment config

Security

API keys are server-side only — never exposed to the browser

Input is length-limited and validated on every API route

CORS is locked to ALLOWED_ORIGIN in production

The canvas renderer uses a safe recursive-descent math parser — no eval or new Function

Security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy) are set on all responses

See SECURITY.md for the full security policy and how to report vulnerabilities.

Known Limitations

No persistence — conversation history is in-memory, cleared on pag

[truncated for AI cost control]