LocalGPT Gen
LocalGPT Gen is a built-in world generation mode. You type natural language, and the AI builds explorable worlds — geometry, materials, lighting, behaviors, audio, and camera. All inside the same single Rust binary, powered by Bevy.
Demo Videos
Installation
# Install the standalone Gen binary
cargo install localgpt-gen
# Or from a source checkout
cargo install --path crates/gen
Usage
# Interactive mode — type prompts in the terminal
localgpt-gen
# Start with an initial prompt
localgpt-gen "create a heart outline with spheres and cubes"
# Load an existing glTF/GLB scene
localgpt-gen --scene ./scene.glb
# Verbose logging
localgpt-gen --verbose
# Combine options
localgpt-gen -v -s ./scene.glb "add warm lighting"
# Custom agent ID (default: "gen")
localgpt-gen --agent my-gen-agent
The agent receives your prompt and iteratively builds a world — spawning shapes, adjusting materials, positioning the camera, and taking screenshots to course-correct. Type /quit or /exit in the terminal to close.
Three Ways to Use Gen (with Bevy Window)
All three modes open a Bevy 3D window where you watch worlds being built in real-time. They differ in who drives the AI and whether you need an API key.
| Interactive (API) | Interactive (CLI Backend) | MCP Server (External App) | |
|---|---|---|---|
| Command | localgpt-gen | localgpt-gen | localgpt-gen mcp-server |
| Who builds the world | LocalGPT's built-in agent | External CLI (Claude CLI, Gemini CLI, Codex) via MCP relay | External app (Claude Desktop, Codex Desktop, VS Code, Zed, Cursor) |
| LLM provider | API key (Anthropic, OpenAI, Ollama, etc.) | CLI subprocess (claude, gemini, codex) | Whatever the external app uses |
| Requires API key? | Yes (or Ollama for local) | No — uses the CLI's own auth | No — the external app handles auth |
| Who manages conversation? | LocalGPT agent loop | The CLI backend (autonomous) | The external app |
| Tool execution | In-process (GenBridge) | CLI → MCP relay → GenBridge | MCP stdio → GenBridge |
| Memory system | Full (MEMORY.md, daily logs, search) | Full (via MCP relay) | Full (via MCP tools) |
| Best for | Direct control, fast iteration | Using Claude/Gemini/Codex without API keys | Editors, desktop apps, multi-tool workflows |
Mode 1: Interactive with API Key
The default. LocalGPT's own agent calls gen tools directly — fastest response, tightest feedback loop.
# Uses your configured model (e.g., claude-sonnet-4-6 via Anthropic API)
localgpt-gen
localgpt-gen "build a castle on a hill"
Set your model in config.toml:
[agent]
default_model = "claude-sonnet-4-6" # or "gpt-4o", "ollama/llama3", etc.
[providers.anthropic]
api_key = "${ANTHROPIC_API_KEY}"
Mode 2: Interactive with CLI Backend (no API key)
Use Claude CLI, Gemini CLI, or Codex as the LLM — they handle auth through their own login. LocalGPT starts an MCP relay so tool calls go to your existing Bevy window.
# Set model to a CLI backend in config.toml
localgpt-gen # with default_model = "claude-cli/opus"
You'll see:
MCP relay active on port 9878 (CLI backends will use this window)
CLI backend detected (claude-cli/opus). Gen tools will route to this window via MCP relay.
The CLI backend runs autonomously — it decides which tools to call and builds the scene. You watch it happen in the Bevy window and can type follow-up prompts.
How it works under the hood:
You type prompt
→ LocalGPT sends to Claude CLI subprocess
→ Claude CLI spawns `localgpt-gen mcp-server --connect`
→ Connects to MCP relay (TCP :9878)
→ Tool calls go to existing Bevy window
→ You see the world being built
See CLI Mode (MCP Relay) for setup and troubleshooting.
Mode 3: MCP Server (external app drives the window)
LocalGPT Gen runs as a tool server. An external app is the orchestrator — it spawns the Bevy window and drives scene building via MCP.
Supported apps include:
- Desktop apps — Claude Desktop, Codex Desktop
- CLI tools — Claude CLI, Gemini CLI, Codex CLI (running directly, not as a LocalGPT backend)
- Editors — VS Code Copilot, Zed, Cursor, Windsurf
localgpt-gen mcp-server
Configure the app to connect (example .mcp.json):
{
"mcpServers": {
"localgpt-gen": {
"command": "localgpt-gen",
"args": ["mcp-server"]
}
}
}
LocalGPT doesn't run its own agent loop — it's purely a tool server. The external app manages the conversation and decides which tools to call.
Mode 2 = you run localgpt-gen and it uses Claude CLI/Gemini CLI/Codex as its LLM backend (the CLI is a subprocess of LocalGPT).
Mode 3 = you run Claude CLI/Gemini CLI/Codex directly and it uses localgpt-gen mcp-server as a tool (LocalGPT is a subprocess of the CLI).
The difference is who is the parent process. In Mode 2, LocalGPT is in charge. In Mode 3, the external app is in charge.
See MCP Server for all supported apps and configuration.
Headless Mode (no window)
Separate from the three modes above, headless mode generates worlds without opening a Bevy window — for batch runs, CI pipelines, and overnight experiment queues.
localgpt-gen headless --prompt "Build a cozy cabin in a snowy forest"
Combined with the memory system, the AI learns your creative style across sessions and applies it automatically. Queue multiple experiments via HEARTBEAT.md or MCP tools, and browse results in the in-app gallery.
See Headless Mode & Experiment Queue for full details.
Features
- Tools — 32 core tools plus 50+ MCP-only tools for characters, interactions, terrain, UI, physics, worldgen, and experiments
- WorldGen Pipeline — Structured world generation: blockout → navmesh → three-tier placement → evaluation
- Behaviors — Data-driven animations (orbit, spin, bounce, etc.)
- Audio — Procedural environmental audio with spatial emitters
- World Skills — Save and load complete worlds as reusable skills
- Export — glTF/GLB (Blender, Unity, Unreal), HTML (browser-viewable with audio + behaviors), screenshots
- MCP Server — Use gen tools from Claude Desktop, VS Code, Zed, Cursor, and other MCP clients
- CLI Mode — MCP relay for Claude CLI, Gemini CLI, and Codex (no API key needed)
- Headless Mode — Batch generation, experiment queue, and creative memory
- External Services — Optional local services for NPC brains, depth preview, and 3D asset generation
- Undo/Redo — Full undo/redo support for all scene edits with persistence
- Streaming Chat — Real-time tool call display and streaming responses
Templates
Jumpstart your project with ready-to-customize world templates:
- Fantasy — Medieval Village, Enchanted Forest, Japanese Temple, Cozy Farm, Winter Wonderland
- Sci-Fi — Space Station, Underwater World, Alien World
- Horror — Haunted House, Backrooms
- Urban — Cyberpunk City, Modern City
Current Limitations
- Visual output depends on the LLM's spatial reasoning ability
- Requires a GPU-capable display for rendering
Showcase
- proofof.video — Video gallery comparing world generations across different models using the same or similar prompts