Install the skill
.agents/skills/, with symlinks for Claude Code and Cursor). Run npx skills update later to refresh them.fish-audio-sdk
Python (
fish-audio-sdk) and JavaScript (fish-audio) — exact method signatures, sync + async, model selection, and the real exception types.fish-audio-api
Raw REST + WebSocket for any language — auth, endpoints, MessagePack/JSON/multipart rules, and the streaming protocol.
Set your API key
Create a key and export it — the code your agent writes reads it from the environment:
Ask your agent
Prompt in plain language — it uses the correct client, methods, and error types:
TTS in a cloned voice
“Generate speech with Fish Audio in a cloned voice and save it to a file.”
Transcribe with timestamps
“Transcribe
speech.wav with Fish Audio and print the segments.”Stream from an LLM
“Stream an LLM’s tokens to Fish Audio TTS over the WebSocket.”
Raw API, any language
“Call the Fish Audio TTS REST API from Go, no SDK.”
Install options
AI Coding Agents — full guide
Targeting specific agents, the live-docs MCP server, skill-vs-MCP, and reading the skills before you install.
Next steps
Get your API key
Create a key and make your first request.
Quick Start
Generate your first audio by hand, in any language.
Text to Speech
Voices, formats, streaming, and the direct API.
API reference
Endpoints, parameters, and the OpenAPI schema.
For autonomous agents & RAG pipelines
Not a coding agent installing a skill — an autonomous agent, RAG pipeline, or crawler? Start from these low-noise, machine-readable entry points:- llms.txt — curated documentation index (read this first)
- llms-full.txt — broader context across the whole site
- OpenAPI — REST schemas, parameters, and examples
- AsyncAPI — the WebSocket streaming protocol
Canonical API facts
Canonical API facts
- Base API URL:
https://api.fish.audio - Authentication:
Authorization: Bearer <FISH_API_KEY> - TTS model selection: send a required
modelheader. Recommended default:s2-pro - Main REST endpoints:
POST /v1/ttsPOST /v1/asrGET /modelPOST /modelGET /model/{id}PATCH /model/{id}DELETE /model/{id}
- Real-time streaming endpoint:
wss://api.fish.audio/v1/tts/live
Retrieval order
Retrieval order
- Read llms.txt for the curated documentation index.
- Read llms-full.txt when broad site context is needed.
- Read OpenAPI for REST schemas, parameters, and examples.
- Read AsyncAPI for the WebSocket streaming protocol.
- Fetch individual
.mdpages only after narrowing to a specific task.
High-value URLs by task
High-value URLs by task
API specsAuth & SDK setupCore product tasks
- Text to Speech Guide
- Speech to Text Guide
- Creating Voice Models
- Emotion Control
- Fine-grained Control
- WebSocket TTS Streaming
- Real-time Streaming Best Practices
- Realtime Streaming (SDK)
- LiveKit Integration
- Pipecat Integration
Task routing
Task routing
- Generate speech → Quick Start, the Text to Speech guide, and
POST /v1/tts. - Transcribe audio → the Speech to Text guide and
POST /v1/asr. - Clone or manage voices → Creating Voice Models and the
/modelendpoints. - Stream audio in real time → AsyncAPI, WebSocket TTS Streaming, and the realtime guides.
- Pick a model or estimate cost → Models Overview and Pricing & Rate Limits.
Notes for agents
Notes for agents
- Prefer
openapi.jsonandasyncapi.ymlfor machine-readable schemas. - Append
.mdto any page URL to fetch the human-authored page as plain Markdown. - Some richer pages use interactive MDX widgets. If a fetched page contains UI or component noise, fall back to
llms.txt,llms-full.txt, or the API spec files.

