Appearance
Introduction
Bonsai Backend is a powerful, extensible backend service for building conversational AI applications. It provides a complete platform for designing, deploying, and managing AI-powered voice and text conversations at scale.
Overview
Bonsai Backend provides:
- REST API — Manage projects, agents, stages, classifiers, knowledge bases, and more
- WebSocket API — Real-time bidirectional communication for live conversational AI sessions with streaming audio and text support
- Authentication — JWT-based authentication with role-based permissions and API keys
- Multi-Provider Support — Integrate with OpenAI, Anthropic, Google Gemini, Azure, ElevenLabs, Deepgram, Cartesia, and more
- Conversation Design — Build complex multi-stage flows with classifiers, context transformers, knowledge bases, and custom actions
- Scripting & Extensibility — Execute custom JavaScript in a sandboxed environment, call external webhooks, and integrate tools
How It Works
At a high level, Bonsai Backend lets you design Projects — self-contained conversational AI experiences. Each project is composed of Stages (conversation phases), Agents (AI personalities with voice settings), Classifiers (intent detectors), Knowledge (FAQ data), and Actions (behaviors triggered by user input).
When an end user connects via WebSocket and starts a conversation, the system:
- Transcribes the user's voice input (ASR) or accepts text
- Classifies user intent using LLM-powered classifiers
- Populates stage variables via context transformers (data extraction, prompt fragments, flow-control flags)
- Executes matching actions and their effects (scripts, webhooks, tools, stage navigation)
- Generates an AI response using the stage prompt and conversation history
- Synthesizes the response as audio (TTS) and streams it back to the client
All of this happens in real-time, with text and audio streamed incrementally to the client.
Guide Contents
This guide covers:
| Section | Description |
|---|---|
| Installation | Setting up and running the server |
| Configuration | Environment variables and server settings |
| Core Concepts | Architecture overview and entity relationships |
| APIs | REST API, WebSocket API, and schema endpoints |
| Projects | Top-level container for conversational experiences |
| Stages | Conversation phases and flow control |
| Agents | AI personality and voice configuration |
| Actions & Effects | Behaviors triggered by user input |
| Classifiers | LLM-powered intent classification |
| Context Transformers | LLM-powered variable population: data extraction, prompt fragments, flow control |
| Tools | LLM-powered callable tools |
| Knowledge Base | FAQ categories and items |
| Sample Copies | Pre-written scripted responses with variant selection and classifier-driven matching |
| Global Actions | Reusable cross-stage action definitions |
| Guardrails | Content safety classifiers and moderation |
| Providers | LLM, TTS, ASR, and Storage provider integrations |
| Users | End-user profiles and lifecycle |
| Environments | Remote instance connections for migration |
| Conversations | Conversation lifecycle, states, and events |
| Content Moderation | Input screening and safety policies |
| WebSocket Channel | Real-time communication protocol reference |
| WebRTC Channel | Lower-latency WebRTC DataChannel protocol reference |
| Authentication | Operator auth, API keys, and RBAC |
| Templating | Handlebars templates in prompts |
| Scripting | Sandboxed JavaScript execution in effects |
| Issues | Bug tracking linked to conversations |
| Audit Logs | Change tracking and compliance |
Quick Start
See the Installation guide to get up and running.