Skip to content

Introduction

Bonsai Backend is a powerful, extensible backend service for building conversational AI applications. It provides a complete platform for designing, deploying, and managing AI-powered voice and text conversations at scale.

Overview

Bonsai Backend provides:

  • REST API — Manage projects, agents, stages, classifiers, knowledge bases, and more
  • WebSocket API — Real-time bidirectional communication for live conversational AI sessions with streaming audio and text support
  • Authentication — JWT-based authentication with role-based permissions and API keys
  • Multi-Provider Support — Integrate with OpenAI, Anthropic, Google Gemini, Azure, ElevenLabs, Deepgram, Cartesia, and more
  • Conversation Design — Build complex multi-stage flows with classifiers, context transformers, knowledge bases, and custom actions
  • Scripting & Extensibility — Execute custom JavaScript in a sandboxed environment, call external webhooks, and integrate tools

How It Works

At a high level, Bonsai Backend lets you design Projects — self-contained conversational AI experiences. Each project is composed of Stages (conversation phases), Agents (AI personalities with voice settings), Classifiers (intent detectors), Knowledge (FAQ data), and Actions (behaviors triggered by user input).

When an end user connects via WebSocket and starts a conversation, the system:

  1. Transcribes the user's voice input (ASR) or accepts text
  2. Classifies user intent using LLM-powered classifiers
  3. Populates stage variables via context transformers (data extraction, prompt fragments, flow-control flags)
  4. Executes matching actions and their effects (scripts, webhooks, tools, stage navigation)
  5. Generates an AI response using the stage prompt and conversation history
  6. Synthesizes the response as audio (TTS) and streams it back to the client

All of this happens in real-time, with text and audio streamed incrementally to the client.

Guide Contents

This guide covers:

SectionDescription
InstallationSetting up and running the server
ConfigurationEnvironment variables and server settings
Core ConceptsArchitecture overview and entity relationships
APIsREST API, WebSocket API, and schema endpoints
ProjectsTop-level container for conversational experiences
StagesConversation phases and flow control
AgentsAI personality and voice configuration
Actions & EffectsBehaviors triggered by user input
ClassifiersLLM-powered intent classification
Context TransformersLLM-powered variable population: data extraction, prompt fragments, flow control
ToolsLLM-powered callable tools
Knowledge BaseFAQ categories and items
Sample CopiesPre-written scripted responses with variant selection and classifier-driven matching
Global ActionsReusable cross-stage action definitions
GuardrailsContent safety classifiers and moderation
ProvidersLLM, TTS, ASR, and Storage provider integrations
UsersEnd-user profiles and lifecycle
EnvironmentsRemote instance connections for migration
ConversationsConversation lifecycle, states, and events
Content ModerationInput screening and safety policies
WebSocket ChannelReal-time communication protocol reference
WebRTC ChannelLower-latency WebRTC DataChannel protocol reference
AuthenticationOperator auth, API keys, and RBAC
TemplatingHandlebars templates in prompts
ScriptingSandboxed JavaScript execution in effects
IssuesBug tracking linked to conversations
Audit LogsChange tracking and compliance

Quick Start

See the Installation guide to get up and running.

Released under the Apache-2.0 License.