Introduction

Bonsai Backend is a powerful, extensible backend service for building conversational AI applications. It provides a complete platform for designing, deploying, and managing AI-powered voice and text conversations at scale.

Overview

Bonsai Backend provides:

REST API — Manage projects, agents, stages, classifiers, knowledge bases, and more
WebSocket API — Real-time bidirectional communication for live conversational AI sessions with streaming audio and text support
Authentication — JWT-based authentication with role-based permissions and API keys
Multi-Provider Support — Integrate with OpenAI, Anthropic, Google Gemini, Azure, ElevenLabs, Deepgram, Cartesia, and more
Conversation Design — Build complex multi-stage flows with classifiers, context transformers, knowledge bases, and custom actions
Scripting & Extensibility — Execute custom JavaScript in a sandboxed environment, call external webhooks, and integrate tools

How It Works

At a high level, Bonsai Backend lets you design Projects — self-contained conversational AI experiences. Each project is composed of Stages (conversation phases), Agents (AI personalities with voice settings), Classifiers (intent detectors), Knowledge (FAQ data), and Actions (behaviors triggered by user input).

When an end user connects via WebSocket and starts a conversation, the system:

Transcribes the user's voice input (ASR) or accepts text
Classifies user intent using LLM-powered classifiers
Populates stage variables via context transformers (data extraction, prompt fragments, flow-control flags)
Executes matching actions and their effects (scripts, webhooks, tools, stage navigation)
Generates an AI response using the stage prompt and conversation history
Synthesizes the response as audio (TTS) and streams it back to the client

All of this happens in real-time, with text and audio streamed incrementally to the client.

Guide Contents

This guide covers:

Section	Description
Installation	Setting up and running the server
Configuration	Environment variables and server settings
Core Concepts	Architecture overview and entity relationships
APIs	REST API, WebSocket API, and schema endpoints
Projects	Top-level container for conversational experiences
Stages	Conversation phases and flow control
Agents	AI personality and voice configuration
Actions & Effects	Behaviors triggered by user input
Classifiers	LLM-powered intent classification
Context Transformers	LLM-powered variable population: data extraction, prompt fragments, flow control
Tools	LLM-powered callable tools
Knowledge Base	FAQ categories and items
Sample Copies	Pre-written scripted responses with variant selection and classifier-driven matching
Global Actions	Reusable cross-stage action definitions
Guardrails	Content safety classifiers and moderation
Providers	LLM, TTS, ASR, and Storage provider integrations
Users	End-user profiles and lifecycle
Environments	Remote instance connections for migration
Conversations	Conversation lifecycle, states, and events
Content Moderation	Input screening and safety policies
WebSocket Channel	Real-time communication protocol reference
WebRTC Channel	Lower-latency WebRTC DataChannel protocol reference
Authentication	Operator auth, API keys, and RBAC
Templating	Handlebars templates in prompts
Scripting	Sandboxed JavaScript execution in effects
Issues	Bug tracking linked to conversations
Audit Logs	Change tracking and compliance

Quick Start

See the Installation guide to get up and running.

Introduction ​

Overview ​

How It Works ​

Guide Contents ​

Quick Start ​

Introduction

Overview

How It Works

Guide Contents

Quick Start