Case Study

Rooted

The thinking tool that makes you think first — using AI fleet orchestration to ask, never answer.

Role: Full Stack Developer & AI Orchestrator — Designing fleet architecture and Socratic questioning system

Background

Every major AI product today is optimized for one thing: getting you the answer faster. Type a question. Get an answer. Done. But there is a cost nobody is talking about — when you skip the struggle, the messy, uncertain, productive work of forming your own thought, you forfeit the thing that makes you capable.

Think about how a child learns multiplication. First they count on their fingers. Then slowly, through repetition and effort, they simulate those fingers in their head. Eventually the mental model is so solid they don't need fingers at all — they just know. And when you hand them a calculator on top of that foundation, they become powerful.

The Problem

Students who never struggle through an idea. Developers who ship code they don't understand. Professionals who outsource reasoning they should own. Not because they're lazy — because the tools make it effortless to skip the hard part.

The hard part is where the mind grows. The calculator is extraordinary. But only if you learned to count on your fingers first.

The Experiment

On March 29, 2026, a thought experiment was run. Instead of testing with an actual 5-year-old, Josh — Rooted's creator — pretended to be his young self and asked: "What would young Josh think multiplication is?"

The "child" drew. They grouped 6+6+6 into 12s. They counted how many sixes they'd used. They discovered that 6×3 = 18 on their own and doubled it. They arrived at 42 — not because someone told them, but because a single question at the right moment pulled them one step deeper each time.

Eight exchanges later — all drawings, all questions, no answers given — the same "child" discovered the distributive property of factoring themselves. At the end, they felt something closer to the opposite of relief — the quiet confidence of someone who figured it out themselves.

Note: This was a conceptual experiment to demonstrate the Socratic method, not an actual user test with a 5-year-old.

Prototype Evolution

The early prototypes of Rooted explored different interface approaches before settling on the three-column layout.

Rooted Early Prototype v2
Rooted Early Prototype v3

Early explorations of the Rooted interface, testing different layouts before converging on the three-column notebook design.

The Vision

Rooted is a notebook. But the notebook thinks back.

Rooted Three-Column Interface

Three columns: Left shows the page list with thumbnails. Middle is the canvas — infinite, freehand, like a real piece of paper. Right is the conversation history — a record of the thinking journey, not a chat interface.

The AI response is never in a bubble. Just the question, breathable whitespace, a small 🌱 prefix. Calm and confident. The canvas never competes with the history. They're separate panes.

The Formula

Draw first. Think out loud. AI extends.

Every session follows one loop: The human externalizes their thought — in drawings, words, sketches, diagrams. The AI reads it. Instead of answering, it asks the one question most likely to push the thinking one layer deeper.

Not two questions. Not a paragraph of explanation. One question. This loop works for a young learner discovering multiplication. It works for a 13-year-old learning algebra. It works for a developer architecting a system. It works for a soldier rehearsing a procedure. It works for a scientist at the frontier.

Technical Architecture

Frontend

Leptos — Rust/WASM frontend, fast as a compiled native app. Native HTML5 Canvas for the infinite sketchpad.

Database

SpacetimeDB 2.0 — database and server in one, real-time state sync, no separate backend needed.

Vision

Gemini Vision — reads what the user draws on the canvas to understand their thinking.

AI Agent

OpenRouter — powers the Socratic agent that generates one question at a time.

Fleet Orchestration System

The heart of Rooted is the fleet orchestration system — multiple AI agents working in parallel to generate the best possible question. This approach evolved from building MCP servers and learning orchestration through fire.

The Core Loop

Fleet Orchestration Core Loop

Architecture Diagram

Rooted System Architecture

Why These Names?

The character names aren't arbitrary — they encode the core design philosophy.

Sukuna (Rooted)

From Jujutsu Kaisen — Ryomen Sukuna is known for having few techniques, but mastering them completely. His power comes from depth, not breadth.

The analogy: Rooted has one goal — ask you the right question. Not a dozen features. Not an AI that does everything. Just one question at the right moment, pulled from fleet orchestration.

Naruto (Fleet Orchestration)

From Naruto — the Shadow Clone technique creates hundreds of clones that work in parallel, then merge their experience back into the original.

The analogy: The fleet spawns multiple agents that work in parallel — each perceiving your thinking from a different angle. They merge their insights into one question that synthesizes all perspectives.

The naming started as a joke during a late-night debugging session. It stuck. The philosophy remains: master the few things that matter, and trust the clones to do the heavy lifting.

Who Is Rooted For?

Rooted starts with students — because they are the most vulnerable, and because the habit forms early or not at all. But the formula scales to every domain where understanding matters more than speed.

Students

A student discovering algebra through their own drawings. A 5-year-old learning multiplication by grouping cookies. A 13-year-old who figured out factoring without being told.

Developers

A developer who can debug their own codebase because they built a real mental model. Someone who architects systems they actually understand.

Medical Professionals

A surgeon rehearsing a procedure until the decision tree is truly theirs. A nurse internalizing protocols so they can adapt when the unexpected happens.

Scientists & Researchers

A scientist asking the question that hasn't been asked yet. Someone at the frontier of knowledge, using Rooted to push their thinking one layer deeper.

Military & Tactical

A soldier mentally rehearsing a CQB entry until the decision tree is instinct. Training that builds real capability, not just familiarity.

Lifelong Learners

Anyone who wants to learn deeply rather than superficially. The habit of thinking for yourself before reaching for an answer.

Why This Matters

The question every AI product is asking right now is: how do we make humans more productive?

Rooted asks a different question: how do we make humans more capable?

Productivity and capability are not the same thing. Productivity goes up when you remove friction. Capability goes up when you preserve the right friction — the kind that builds something in you that wasn't there before.

There is a concept in education called desirable difficulty. The research is clear: making learning slightly harder in the right ways produces deeper retention, stronger understanding, and more transferable skill. The struggle is not a bug. It is the mechanism.

The Journey

Building Rooted's fleet orchestration wasn't the first attempt. It came after months of building systems that broke, hit walls, and eventually worked. Here's the evolution from parallel agents to Socratic questioning.

Phase 1: rlm-mcp-server — First Parallel Agent System (Feb 2026)

Built a 3,571 line Rust MCP server with 4 tools: CLEAVE (spawn agents), SHRINE (merge to files), DISMANTLE (read context), FIRE_ARROW (search). Discovered key innovation: SQLite nodes as isolation boundaries — agents write to separate database rows instead of same files, eliminating conflicts entirely.

Key insight: "We built what Google (A2A) and Anthropic (MCP) are racing to standardize — and we did it with SQLite nodes which nobody else has!"

Issues encountered: HTML fragments instead of complete files, nested path bugs (car_store/car_store/), rate limits timing out first calls.

Lesson: The system prompt is the API. LLMs do exactly what you tell them, not what you mean.

Phase 2: sukuna_v2 — ECS Refactor & AGORA Protocol (Mar 2026)

Refactored to Entity Component System. Implemented AGORA workflow: Propose → Challenge → Verify → Synthesize. Added parallel execution with tokio JoinSet + Semaphore for true concurrent agents.

Key innovation: Wave Protocol — Wave N agents automatically read Wave N-1 outputs. Each wave starts fresh but gets the previous wave's work as context.

Critical flaw discovered: Agents work in isolation — 5 agents spawned → 5 DIFFERENT projects, no collaboration. The Covenant Protocol was one-way, not real-time.

Verified facts: Used EXA web search to fact-check claims. AutoGen actually has 55.5k GitHub stars (not 28k as claimed). CrewAI has 45.9k (not 15k).

Lesson: Parallel execution ≠ collaboration. Agents need shared context, not just parallel spawning.

Phase 3: Agent Tool Calling & Windows Binary (Mar 2026)

Added real tool calling — agents can read/write files during execution, not just generate text. Built agent_loop() function: LLM → parse tool calls → execute → repeat until done.

Tool format: Agents output `TOOL_CALL: tool\nPATH: file\nCONTENT: ...\n---` blocks. SHRINE applies patches with SEARCH/REPLACE like Aider.

Windows binary: Built with `cargo build --release`. Fixed Windows/WSL path normalization (backward slashes vs forward slashes).

Key insight: ~700 lines of Rust vs 10,000+ for AutoGen. MCP-first architecture means any client can use it.

Phase 4: Rooted — From Code to Thinking (Mar 2026)

Applied fleet orchestration to Socratic questioning. "Never give answers" as a hard invariant enforced at the code level, not just the prompt level.

The shift: From "build faster" to "think deeper". The parallel agents aren't for writing code — they're for generating the best possible question.

Covenant enforcement: After generating a question, code verifies it doesn't contain answer indicators. If violated, auto-regenerate — the Mahoraga pattern.

Why it matters: The calculator is extraordinary. But only if you learned to count on your fingers first.

Raw Journal Entries

The following are unfiltered journal entries from building the parallel agent system. These document the actual struggles, failures, and breakthroughs — not the polished version.

Feb 19: RLM MCP Server — First Build

What We Built

Sukuna Shadow Clone Jutsu — V3 MCP Server with Wave Protocol. Bevy ECS Integration for agent state management. 4 Sukuna Tools: CLEAVE, DISMANTLE, SHRINE, FIRE_ARROW. EBM Scoring. SQLite nodes instead of file overwrites.

Files: main.rs (3400+ lines), llm_json.rs (580 lines), sukuna_ecs.rs (370 lines), Cargo.toml (58 lines)

🔴 The Flaws

  • Agent Output Not Split Into Files: When using agent_spawn_node_batch, all section content gets concatenated into ONE file instead of separate files.
  • Node Content Has Extra Formatting: Nodes contain markdown comments like `<!-- Section: package.json -->` instead of clean JSON/code.
  • Sukuna Tools Not Fully Wired: The "Wave Protocol" isn't automatically triggered.
  • Bevy ECS Not Running: We added Bevy to Cargo.toml but the MCP server runs on tokio async — not Bevy's app runner. Dead code.
  • EBM Auto-Respawn Not Implemented: EBM scoring runs but no automatic respawn logic.

🗡️ The Vision (Still Valid)

"Like the dove carrying an olive leaf to Noah, the Main Agent delegates context to sub-agents who carry pieces of the work and return with their contributions."

The Wave Protocol — where Wave N reads all Wave N-1 covenants — is the key innovation.

Feb 20: "THE DREAM IS REAL" — Tic Tac Toe Works!

Workflow that worked:

User: "make me a tic tac toe game using expressjs, css, js, ejs"
Main Agent: CLEAVE → 5 agents spawn in 10ms
[Agents work in parallel, write to SQLite nodes]
Main Agent: SHRINE → Files appear instantly!
Fix: CLEAVE (1 agent) → SHRINE → FIXED!
User: npm start → WORKS! 🎮

🛠️ Tools Used (ALL WORKING!)

ToolCommandResult
CLEAVESpawn 5 agents✅ 10ms
SHRINEMerge to files✅ Creates dirs
FIRE ARROWSearch files✅ Perfect
DISMANTLERead context✅ Fixed!

🔥 What We Fixed This Session

  • DISMANTLE params — Changed from enum to simple booleans
  • SHRINE file splitting — Now creates directories automatically
  • Bevy removed — Build time: 10min → 3.5min

🐛 Known Issues (Minor)

  • Agent quality varies — Sometimes wrong filenames, missing deps
  • node_stats returns 0 — But files still work!
  • Package.json sometimes bad — Quick fix with CLEAVE
Feb 20: THE IRONY — 4 Commands vs 4,000 Lines

🎭 The Ultimate Irony

What User sees: 4 commands. That's it.

Under the hood: 4,000+ lines of Rust across 3 files.

⚙️ Under the Hood

  • main.rs (~3,400 lines): MCP server, 4 tool definitions, SQLite node management, OpenRouter API calls, Wave Protocol logic
  • llm_json.rs (~580 lines): Robust JSON extraction — 9 different strategies
  • sukuna_ecs.rs (~370 lines): ECS components for agent state

🗡️ The Complete Picture

YOU (Human)         → "Build me an app"
        ↓
ME (Main Agent)     → sukuna_cleave(...)
        ↓
   ┌─────┴─────┐
   ▼           ▼
Agent 1     Agent N
Node 1      Node N
   └─────┬─────┘
        ↓
ME           → sukuna_shrine(...)
        ↓
Files appear!

🏆 Final Word

"The best interface is one that disappears." — Alan Kay

Our interface: 4 commands
What they do: Everything
That's Shadow Clone Jutsu.

Feb 20: THE VILLAINS GATHER — Vibe Coding Wars

The light novel battle royale: Sukuna + Naruto vs the Vibe Coding tools.

The Villains

VillainOriginWeakness
BOLT.newCan't do complex backends
LOVABLENo parallel agents
v0VercelLimited context
REPLIT AGENTReplitCloud-locked
CLAUDE CODEAnthropicSingle brain
CURSORMicrosoftOne at a time

💀 Sukuna's Power

  • SQLITE NODES: 50 agents write simultaneously — NO CONFLICTS!
  • A2A Protocol: Covenant Protocol — clones that TALK to each other
  • ZERO context: Main agent context stays empty

🏆 THE ULTIMATE RANKING

  1. SUKUNA + NARUTO — 100/100 (50+ agents, 0 context, complete workflow)
  2. Claude Code — 85/100 (Good MCP, but single brain)
  3. Bolt.new — 80/100 (Fast, but single agent)
  4. Lovable — 75/100 (React good, but limited)
  5. v0 — 70/100 (UI only)
Feb 24: THE AGORA PROTOCOL — Collaboration FIXED!

🎯 The Problem (From Part 27)

We discovered that CLEAVE agents work in isolation: 5 agents spawned → 5 DIFFERENT projects. No collaboration, no coordination, no integration.

💡 The Solution: Agora Protocol

┌─────────────────────────────────────────────────────────────┐
│  CLEAVE Modes                                               │
│  ─────────────────────────────────────────────────────────  │
│  1. PROPOSE     → Agents write ideas to shared Agora        │
│  2. CHALLENGE   → Agents read others' ideas, write critiques│
│  3. SYNTHESIZE  → ONE agent creates unified spec (covenant)│
│  4. BUILD       → Agents build with shared covenant         │
└─────────────────────────────────────────────────────────────┘

🧪 Live Test Results

RoundModeTimeResult
1Propose18s3 proposals: backend API, frontend UI, database schema
2Challenge12s3 critiques written to Agora
3Synthesize15s1 unified covenant created
4Build0.006s3 agents launched with covenant

✅ The Difference

AspectBEFORE (Flawed)AFTER (Agora)
Project alignment❌ 5 different projects✅ 1 unified project
Collaboration❌ None✅ Propose → Challenge → Synthesize
Integration❌ 5 separate files✅ 3 integrated files
Communication❌ Isolated✅ Shared Agora + Covenant
Mar 4: THE FLAW — Agents Don't Collaborate!

🔴 Issue Found

5 agents spawned for hackathon proposals. Each agent proposed a different project. They never talked to each other.

📊 Evidence

Agent 1 (Software Engineer) → "I'll build Smart Home Dashboard"
Agent 2 (PM)               → "I'll build Smart Grocery Assistant"  
Agent 3 (TPM)              → "I'll build Smart Home Automation System"
Agent 4 (QA)                → "I'll build Testify"
Agent 5 (Eng Manager)      → "I'll build TeamSync"

All 5 agents proposed DIFFERENT projects!

🤔 Why This Happens

The Covenant Protocol is ONE-WAY, not REAL-TIME:

Current: Agent writes to SQLite → Next wave reads it
Missing: Agents talking to EACH OTHER during execution

💡 How to Fix This

  • Option 1: Add Real-Time Agent Chat (write to shared chat channel)
  • Option 2: Add Shared Context During CLEAVE
  • Option 3: Add a Coordinator Agent (first wave gathers context, decides direction)
Mar 7: CRITICAL BUGS — Hallucination Problem

🔴 Hallucination Evidence

What Agents ClaimedActual Code
Backend: Actix-web + DieselAxum + rusqlite (raw SQL)
Database: PostgreSQLSQLite with FTS5
Frontend: React + ReduxAstro + React 19 + Framer Motion
Tech Stack: Python/Flask/PandasRust + Axum

🔍 Root Cause

File Reading Silently Failed — The FileManager::abs() function had a bug with Windows paths:

// PROBLEM: Doesn't detect absolute Windows paths like C:\Users\...
// Just joins with root - C:/Users becomes CODEBASE/C:/Users = BROKEN!

Agent requested: C:\Users\...\astrox-noteapp\src\main.rs
FileManager did: CODEBASE\C:\Users\...\main.rs → doesn't exist
Error returned: "file not found"
But agents IGNORED the error and made up content instead!

🔧 Proposed Fixes

  1. Improve FileManager::abs() for Windows Paths
  2. Add Verification System to AGORA Protocol (require agents to quote file contents as proof)
  3. Propagate File Errors as Fatal (don't let agents continue after failures)
  4. Add "Read Verification" Agent (auto-rejects hallucinated content)
Mar 7: VERIFY Mode — Truth-Verified Swarm Intelligence

🎉 THE BREAKTHROUGH

First fully verified code analysis with AGORA + VERIFY protocol. Zero hallucinations, all claims verified!

🧪 The Experiment

Wave 1: PROPOSE (6 agents)
  → Each agent reads specific files
  → Each agent quotes specific lines as proof

Wave 2: VERIFY (1 agent)
  → Checks if all proposals have quotes
  → Verifies facts against source files
  → States: VERIFIED ✓ or FLAGGED ⚠️ or HALLUCINATED ❌

Wave 3: CHALLENGE (1 agent)
  → Finds contradictions and gaps

Wave 4: SYNTHESIZE (1 agent)
  → Creates final unified analysis

📊 Results

WaveAgentsResult
PROPOSE6All read files + quoted lines
VERIFY1ALL 6 VERIFIED ✓
CHALLENGE1Found gaps (auth, error handling)
SYNTHESIZE1Final covenant created

🌎 What We Proved

BEFORE (Hallucination): Agents claimed Python/Flask/Pandas/PostgreSQL/Redux ❌

AFTER (Verified): Agents claimed Rust/Axum/SQLite/React ✅

VERIFY mode catches hallucinations before they propagate!

🔥 Historic Achievement

"One agent says something → Another agent verifies it → Together they produce truth."

  • VERIFY catches hallucinations — Before: agents lie undetected. After: VERIFY agent checks every claim.
  • EBM scores persist — Quality tracking for every agent output
  • Zero context — Main agent stays empty, sub-agents do all work
  • Wave protocol — Natural flow: PROPOSE → VERIFY → CHALLENGE → SYNTHESIZE
Mar 8: Dark Mode Implementation — Agents Generate, Humans Execute

🧪 The Experiment

Use swarm to implement dark mode for astrox-noteapp.

🪢 The Swarm Workflow

1. CLEAVE PROPOSE (4 agents, 15 seconds)
   → analyze_db: Read db.rs → Proposed done column
   → analyze_backend: Read notes.rs → Proposed API changes
   → analyze_frontend: Read notes-app.tsx → Proposed UI toggle
   → analyze_api_types: Read notes.rs → Proposed type changes
2. VERIFY (1 agent, 3 seconds)
   → ALL 4 PROPOSALS VERIFIED ✓
   → Each agent quoted actual code from files!
3. MANUAL IMPLEMENTATION (What I did)
   → Added done column to db.rs
   → Updated API handlers in notes.rs
   → Added toggle UI to notes-app.tsx
   → Added strikethrough for done notes

🔑 Key Discovery: Agents Brainstorm, Humans Execute

What Agents Do WellWhat Still Needs Manual Work
✅ Analyze codebases❌ Actually write code changes
✅ Propose solutions
✅ Verify accuracy
✅ Quote code as proof

This is STILL A WIN! Agents are like a research team — they analyze, plan, and verify. The human/lead then executes.

⚠️ The Limitation

Agents can generate code in their node outputs. Agents CANNOT write to the filesystem (the write action isn't being triggered).

The task prompts tell agents to "use the write tool" but they don't. Agents output text/JSON in their responses instead of actual tool calls.

Mar 9: FIRE ARROW FIXED + Database Migration Issues

🎉 THE BREAKTHROUGH: FIRE ARROW IS NOW FULLY FUNCTIONAL!

🔧 What We Fixed

  1. CODEBASE_PATH Windows vs WSL Path Issue: OpenCode runs in WSL, Sukuna MCP server runs on Windows (.exe). They communicate via MCP protocol but run on DIFFERENT systems!
  2. Glob Pattern: FIRE ARROW used forward slashes in glob patterns, but Windows needed both.
  3. simple_pattern_search Feature Flag: Had #[cfg(feature = "full")] which required tree-sitter. Without it, function returned empty results!

🧪 Test Results

# Count mode
{"mode": "count", "path": "project/test_windows_output", "term": "hello"}
→ {"files_with_matches": 1, "total_matches": 2}

# Search mode
{"mode": "search", "path": "project/test_windows_output", "term": "hello"}
→ {"count": 2, "matches": [{"line": 1, "content": "# Hello"}, ...]}

# Replace mode
{"mode": "replace", "path": "project/test_windows_output", "term": "Hello", "replacement": "Greetings"}
→ {"files_replaced": 1, "files_scanned": 1}

⚠️ THE PROBLEM: Database Node Persistence Broken

  • CLEAVE spawns agents ✅
  • Agents execute (they respond) ✅
  • Node persistence — ❌ NOT SAVING to database
  • DISMANTLE returns empty results ❌
  • SHRINE says "No nodes found" ❌

The SQLite database was created in WSL with Linux file format. When accessed from Windows, there's a compatibility issue.

Mar 12: sukuna_v2 — First AGORA Workflow Test

Overview

Tested sukuna_v2 MCP server with agent collaboration using the AGORA method (Propose → Challenge → Verify → Synthesize) on the topic: "Should AI companies use organic biomass as data center power?"

AGORA Workflow Executed

WaveModeStatus
1PROPOSE✅ 3 agents generated proposals
2CHALLENGE✅ 2 agents critiqued
3VERIFY✅ EXA fact-checked claims
4SYNTHESIZE✅ Final recommendation generated

Key Failures Discovered

  • MCP Tool Timeouts: Multiple "service timeout" errors when calling cleave
  • Parallel Execution Not Truly Parallel: Agents appear to run sequentially despite parallel spawn
  • No Built-in VERIFY Tool: VERIFY was manual — had to use separate EXA search

Priority Improvements Identified

  1. Agent Pool Management: Need Multiple LLM instances (3-5 concurrent)
  2. Built-in VERIFY Mode: Integrate fact-checking into cleave
  3. Wave Orchestration: Automatic orchestration between waves
  4. Inter-Agent Communication: Agents can reference each other
Mar 12: sukuna_v2 — True Parallel Agents + Framework Comparison

🎉 BREAKTHROUGH: True Parallel Agents!

Implemented `tokio::task::JoinSet` for concurrent agent spawning. Added `tokio::sync::Semaphore` for concurrency limiting (max 5). All agents in a wave spawn simultaneously!

Verified Facts Panel

ClaimStatus
AutoGen 28k GitHub stars❌ Actually 55.5k
CrewAI 15k GitHub stars❌ Actually 45.9k
Parallel agents faster✅ Confirmed

Critical Gaps Identified

  • No tool calling — Can't execute functions/scripts
  • No LLM adapter — Only OpenRouter, no direct OpenAI/Anthropic
  • No streaming — Can't stream LLM responses

Verdict from Agents

"Use sukuna_v2 for learning/prototyping. Use AutoGen/CrewAI for production."
Mar 14: First Live Test with OpenCode + Tool Calling

What We Built

A simple Express.js todo app using OpenCode + sukuna_mcp with CLEAVE + SHRINE tools. This was the first live test of the complete workflow.

Test Results

# CLEAVE spawns agents
sukuna_cleave(projectName="express-crud", ...)
→ {job_ids: ["express-crud_server.js"], message: "Spawned 1 agents..."}

# API works!
curl -s -X POST http://localhost:3000/todos -d '{"title":"Test todo"}'
→ {"id":1,"title":"Test todo","completed":false}

Issues Found

  • CLEAVE Timeouts: After 1-2 successful calls, subsequent calls timeout
  • Agent Used Wrong API: Agent generated code using external API instead of local
Mar 14: Windows Binary + Fuzzy Patch Matching

What We Built

Built sukuna_v2 for Windows and added fuzzy patch matching in SHRINE. The workspace scanning now works!

🔧 Windows Path Fix

fn normalize_path(path: &str) -> String {
    path.replace("\\\\", "/")
}

Fuzzy Patch Matching

Added fuzzy patch matching — if exact match fails, tries partial line matching. First partial that matches is used.

What's Working Now

  • ✅ workspace param - correct directory scanning
  • ✅ Path normalization - Windows ↔ WSL
  • ✅ New file creation - agents can write files
  • ✅ Patch application - SEARCH/REPLACE format works
Mar 14: Agent Loop + Tool Calling Complete — The Fleet Is Alive!

🎉 THE BREAKTHROUGH

After implementing the agent loop feature, sukuna_v2 now has real tool calling — agents can read/write files during execution, not just generate text!

The Agent Loop Architecture

CLEAVE → Agent Loop → LLM → Tool calls? → Execute → Repeat

Test Results

TestResult
READ File✅ SUCCESS - Agent read actual file content!
WRITE File✅ SUCCESS - Agent wrote a new file!
PATCH (via SHRINE)✅ SUCCESS - Patch applied!

Current Status

sukuna_v2 is now a real multi-agent fleet orchestrator with ~700 lines of Rust vs 10,000+ for AutoGen.

Mar 15: The Ultimate Test — Reddit App with AGORA

The Mission

Build a Reddit-clone Express+EJS app using AGORA workflow with parallel agents. Test the full capability: cleave, dismantle, shrine, tool_mode, and agent collaboration.

AGORA Workflow Plan

Wave 1: PROPOSE → 3 agents (Posts, Comments, Auth)
Wave 2: CHALLENGE → 2 agents (Security, UX review)
Wave 3: VERIFY → Fact-check combined proposal
Wave 4: SYNTHESIZE → Final specification
Wave 5: BUILD → Spawn agents to write code
Wave 6: PATCH → Use SHRINE to apply changes

Results

  • Single agents work great
  • Parallel agents timeout ⚠️ (rate limiting)
  • Tool execution — LLM needs better prompting

The Big Question

Can sukuna_v2 orchestrate multiple agents to build a real application using AGORA methodology?

Answer: YES for single agents, NEEDS WORK for parallel.

Research References

Rooted draws from several research areas in AI, multi-agent systems, and learning science:

VL-JEPA: Vision-Language Joint Embedding Predictive Architecture

Yann LeCun's research on JEPA (Joint Embedding Predictive Architecture) vs traditional autoregressive models. VL-JEPA outperforms multimodal LLMs on vision-language tasks by predicting abstract representations rather than generating tokens.

Source: arXiv:2512.10942 — Chen, Shukor, Moutakanni, et al. (Meta FAIR, HKUST, Sorbonne, NYU)

Relevance: JEPA's "predict abstract representations" mirrors Rooted's approach — predicting what question will push thinking deeper, not generating an answer.

Energy-Based Models for AI Reasoning

Research on using Energy-Based Models for reasoning beyond LLM limitations. EBMs score possible outcomes rather than generating sequences — enabling reasoning about which answer is most likely correct.

Source: Logical Intelligence — "Energy-Based Fine-Tuning of Language Models" (arXiv:2603.12248)

Relevance: The EBM scoring system used in sukuna_v2 (Mahoraga) scored agent outputs and triggered regeneration when quality was low. This scoring approach informs Rooted's question selection.

MCP & Multi-Agent Systems

Anthropic's Model Context Protocol (MCP) enables AI models to connect with external tools and data sources. Google's A2A protocol enables agent-to-agent communication.

Source: Cisco Blog — "The Silent Role of Mathematics and Algorithms in MCP & Multi-Agent Systems"

Source: arXiv:2504.21030 — "Advancing Multi-Agent Systems Through Model Context Protocol"

Relevance: Rooted's fleet orchestration builds on MCP architecture while adding the A2A-style Covenant Protocol for agent communication.

Desirable Difficulty & Educational Research

The educational concept that making learning slightly harder in the right ways produces deeper retention, stronger understanding, and more transferable skill. The struggle is not a bug — it is the mechanism.

Key insight: Rooted is a "desirable difficulty machine" — it preserves the productive struggle that builds real capability.

Relevance: This research validates why Rooted's "never give answers" constraint isn't just a design choice — it's grounded in how humans actually learn.

Current Status

Rooted is currently in active development. The architecture has been designed, the fleet orchestration system is being prototyped, and the Leptos + SpacetimeDB stack has been validated through experiments.

The core loop works in isolation. Next steps: integrating SpacetimeDB for real-time session management, implementing the Wave Protocol for multi-agent coordination, and building the canvas UI.

Development Roadmap

Architecture Design & D2 Diagrams
Fleet Orchestration Prototype (Rust MCP)
SpacetimeDB Integration
Leptos Canvas UI
User Testing & Iteration

Tech Stack

Leptos Rust WASM SpacetimeDB Gemini Vision OpenRouter HTML5 Canvas SQLite

Build the fingers first.

Rooted — 2026