Agent Quickstart - Fish Audio

Install the Fish Audio agent skill and your coding agent — Claude Code, Cursor, Codex, and others — writes correct, current Fish Audio code: right method names, units, and error types, instead of guessing. Here’s the fastest path.

Install the skill

npx skills add https://docs.fish.audio

This installs both Fish Audio skills (a canonical copy in .agents/skills/, with symlinks for Claude Code and Cursor). Run npx skills update later to refresh them.

fish-audio-sdk

Python (fish-audio-sdk) and JavaScript (fish-audio) — exact method signatures, sync + async, model selection, and the real exception types.

fish-audio-api

Raw REST + WebSocket for any language — auth, endpoints, MessagePack/JSON/multipart rules, and the streaming protocol.

Set your API key

Create a key and export it — the code your agent writes reads it from the environment:

export FISH_API_KEY="your_api_key_here"

Ask your agent

Prompt in plain language — it uses the correct client, methods, and error types:

TTS in a cloned voice

“Generate speech with Fish Audio in a cloned voice and save it to a file.”

Transcribe with timestamps

“Transcribe speech.wav with Fish Audio and print the segments.”

Stream from an LLM

“Stream an LLM’s tokens to Fish Audio TTS over the WebSocket.”

Raw API, any language

“Call the Fish Audio TTS REST API from Go, no SDK.”

Install options

npx skills add https://docs.fish.audio

AI Coding Agents — full guide

Targeting specific agents, the live-docs MCP server, skill-vs-MCP, and reading the skills before you install.

Next steps

Get your API key

Create a key and make your first request.

Quick Start

Generate your first audio by hand, in any language.

Text to Speech

Voices, formats, streaming, and the direct API.

API reference

Endpoints, parameters, and the OpenAPI schema.

For autonomous agents & RAG pipelines

Not a coding agent installing a skill — an autonomous agent, RAG pipeline, or crawler? Start from these low-noise, machine-readable entry points:

llms.txt — curated documentation index (read this first)
llms-full.txt — broader context across the whole site
OpenAPI — REST schemas, parameters, and examples
AsyncAPI — the WebSocket streaming protocol

Canonical API facts

Base API URL: https://api.fish.audio
Authentication: Authorization: Bearer <FISH_API_KEY>
TTS model selection: send a required model header. Recommended default: s2-pro
Main REST endpoints:
- POST /v1/tts
- POST /v1/asr
- GET /model
- POST /model
- GET /model/{id}
- PATCH /model/{id}
- DELETE /model/{id}
Real-time streaming endpoint: wss://api.fish.audio/v1/tts/live

Retrieval order

Read llms.txt for the curated documentation index.
Read llms-full.txt when broad site context is needed.
Read OpenAPI for REST schemas, parameters, and examples.
Read AsyncAPI for the WebSocket streaming protocol.
Fetch individual .md pages only after narrowing to a specific task.

High-value URLs by task

API specs

Auth & SDK setup

Core product tasks

Real-time & integrations

Models, pricing & lifecycle

Task routing

Generate speech → Quick Start, the Text to Speech guide, and POST /v1/tts.
Transcribe audio → the Speech to Text guide and POST /v1/asr.
Clone or manage voices → Creating Voice Models and the /model endpoints.
Stream audio in real time → AsyncAPI, WebSocket TTS Streaming, and the realtime guides.
Pick a model or estimate cost → Models Overview and Pricing & Rate Limits.

Notes for agents

Prefer openapi.json and asyncapi.yml for machine-readable schemas.
Append .md to any page URL to fetch the human-authored page as plain Markdown.
Some richer pages use interactive MDX widgets. If a fetched page contains UI or component noise, fall back to llms.txt, llms-full.txt, or the API spec files.