> ## Documentation Index > Fetch the complete documentation index at: https://hanabiaiinc-docs-platform-create-voice.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Text to Speech > Turn text into lifelike speech — use it however you build Generate natural speech from text with the `s2-pro` and `s1` models. Pick a voice, choose a format, and go — from the API directly, the Python library, or JavaScript. No code — type, pick a voice, generate. Every parameter for `POST /v1/tts`. Ready-made recipes: streaming, telephony, and more. ## When to use it Audiobooks, explainers, ads, and video narration. Speak an assistant's replies — pair with [streaming](/features/realtime-streaming) for low latency. Read content aloud, phone menus, notifications. Speak in a [cloned voice](/features/voice-cloning) you own. ## Quick start Send text, get back audio. Choose your implementation: ```python Python theme={null} from fishaudio import FishAudio from fishaudio.utils import save client = FishAudio() # reads FISH_API_KEY audio = client.tts.convert(text="Hello from Fish Audio!") save(audio, "out.mp3") ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/tts \ --header "Authorization: Bearer $FISH_API_KEY" \ --header "Content-Type: application/json" \ --header "model: s2-pro" \ --data '{ "text": "Hello from Fish Audio!", "format": "mp3" }' \ --output out.mp3 ``` ```javascript JavaScript theme={null} import { FishAudioClient } from "fish-audio"; import { writeFile } from "fs/promises"; const client = new FishAudioClient({ apiKey: process.env.FISH_API_KEY }); // `s2-pro` is passed explicitly (the SDK default is `s1`). const stream = await client.textToSpeech.convert( { text: "Hello from Fish Audio!" }, "s2-pro", ); const chunks = []; for await (const chunk of stream) chunks.push(Buffer.from(chunk)); await writeFile("hello.mp3", Buffer.concat(chunks)); ``` ## Use a specific voice Pass a **voice model id** (`reference_id`). Find ids in the [Voice Library](/overview/platform) or create your own via [Voice Cloning](/features/voice-cloning). ```python Python theme={null} audio = client.tts.convert( text="This uses a specific voice.", reference_id="802e3bc2b27e49c2995d23ef70e6ac89", ) ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/tts \ --header "Authorization: Bearer $FISH_API_KEY" \ --header "Content-Type: application/json" \ --header "model: s2-pro" \ --data '{ "text": "This uses a specific voice.", "reference_id": "802e3bc2b27e49c2995d23ef70e6ac89", "format": "mp3" }' \ --output out.mp3 ``` ## Implementation details ### Models * **`s2-pro`** (default) — highest quality, multi-speaker, natural-language expression control. * **`s1`** — previous generation, `(parenthesis)` emotion tags. In the API, select with the `model` request header. In Python, pass `model="s2-pro"`. See [Choosing a Model](/developer-guide/models-pricing/choosing-a-model). ### Output formats `mp3` (default), `wav`, `pcm`, `opus`. Set `format` (and optionally `mp3_bitrate`, `sample_rate`). ```python Python theme={null} from fishaudio.types import TTSConfig audio = client.tts.convert( text="High quality", config=TTSConfig(format="wav", sample_rate=44100), ) ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/tts \ --header "Authorization: Bearer $FISH_API_KEY" \ --header "Content-Type: application/json" \ --header "model: s2-pro" \ --data '{ "text": "High quality", "format": "wav", "sample_rate": 44100 }' \ --output out.wav ``` ### Speed & prosody Adjust speech speed (0.5–2.0) and volume. ```python Python theme={null} audio = client.tts.convert(text="Speaking faster.", speed=1.5) ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/tts \ --header "Authorization: Bearer $FISH_API_KEY" \ --header "Content-Type: application/json" \ --header "model: s2-pro" \ --data '{ "text": "Speaking faster.", "prosody": { "speed": 1.5 } }' \ --output out.mp3 ``` ### Generation methods (Python) The Python SDK exposes three ways to generate, depending on whether you have the full text upfront and how you want to consume the audio: | Method | Returns | Use it for | | ------------------------ | ----------------------------------------------- | ----------------------------------------------------------------------------- | | `tts.convert()` | complete audio `bytes` | most cases — you have the text, you want the file | | `tts.stream()` | `AudioStream` (iterate chunks, or `.collect()`) | memory-efficient transfer of large audio; write chunks to disk as they arrive | | `tts.stream_websocket()` | iterator of audio `bytes` | text arriving in real time (LLM tokens, live captions) | ```python theme={null} # Memory-efficient: write each chunk as it arrives instead of buffering audio_stream = client.tts.stream(text="A very long passage...") with open("out.mp3", "wb") as f: for chunk in audio_stream: f.write(chunk) ``` For real-time text streaming with `stream_websocket()`, see [Realtime Streaming](/features/realtime-streaming). ### Instant voice cloning (reference audio) Instead of a saved `reference_id`, pass raw audio plus its transcript to clone a voice on the fly — no training step. Best with a clean 10–30s sample. ```python theme={null} from fishaudio.types import ReferenceAudio with open("sample.wav", "rb") as f: audio = client.tts.convert( text="Spoken in the reference voice.", references=[ReferenceAudio(audio=f.read(), text="Transcript of the sample.")], ) ``` To reuse a voice across many requests, [clone it once](/features/voice-cloning) and pass the resulting `reference_id` instead. ### Format & bitrate Pick a format for your delivery channel, and tune bitrate to trade size against quality: | Format | Notes | | --------------- | ---------------------------------------------------------------------------- | | `mp3` (default) | good size/quality balance; set `mp3_bitrate` to `64`, `128`, or `192` | | `wav` | uncompressed, highest quality; set `sample_rate` (e.g. `44100`) | | `pcm` | raw samples, no container — for low-latency playback and telephony pipelines | | `opus` | efficient for streaming; bitrate is automatic (`opus_bitrate=-1000`) | ```python theme={null} from fishaudio.types import TTSConfig audio = client.tts.convert( text="Smaller file, lower bitrate.", config=TTSConfig(format="mp3", mp3_bitrate=64), ) ``` ### Latency & chunk length `latency` trades stability for speed; `chunk_length` controls how much text the engine batches before it starts generating. * `latency="balanced"` (default) — lower time-to-first-audio (\~300ms). Good for interactive use. * `latency="normal"` — most stable output, at slightly higher latency. * `chunk_length` (`100`–`300`, default `200`) — smaller chunks start audio sooner; larger chunks are more efficient for long text. ```python Python theme={null} from fishaudio.types import TTSConfig audio = client.tts.convert( text="Quick, responsive output.", config=TTSConfig(latency="balanced", chunk_length=150), ) ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/tts \ --header "Authorization: Bearer $FISH_API_KEY" \ --header "Content-Type: application/json" \ --header "model: s2-pro" \ --data '{ "text": "Quick, responsive output.", "latency": "balanced", "chunk_length": 150 }' \ --output out.mp3 ``` ### Direct API (MessagePack) `POST /v1/tts` also accepts a MessagePack body (`Content-Type: application/msgpack`) — the path the [API reference](/api-reference/endpoint/openapi-v1/text-to-speech) is built around. Use it to send binary reference audio in the request without base64 overhead, or when you don't want the SDK. ```python theme={null} import os import httpx import ormsgpack payload = {"text": "Hello from the direct API.", "reference_id": "YOUR_VOICE_ID", "format": "mp3"} resp = httpx.post( "https://api.fish.audio/v1/tts", content=ormsgpack.packb(payload), headers={ "Authorization": f"Bearer {os.environ['FISH_API_KEY']}", "Content-Type": "application/msgpack", "model": "s2-pro", }, ) with open("out.mp3", "wb") as f: f.write(resp.content) ``` The `model` header is required on every request. JSON and MessagePack accept the same fields. ### Advanced generation tuning For finer control, `TTSConfig` exposes the model's sampling parameters. The defaults are well-tuned — reach for these only when you need to dial in determinism or curb artifacts. ```python theme={null} from fishaudio.types import TTSConfig, Prosody config = TTSConfig( prosody=Prosody(speed=1.1, volume=0), temperature=0.7, # lower = more deterministic top_p=0.7, repetition_penalty=1.2, # >1.0 curbs repeated sounds max_new_tokens=1024, # cap audio length per chunk normalize=True, # expand numbers/dates for natural reading ) audio = client.tts.convert(text="Carefully tuned output.", config=config) ``` A `TTSConfig` is reusable — define it once and pass it to many `convert()` calls. See the [full field list](/api-reference/sdk/python/types#ttsconfig-objects) for every parameter and default. ## Going further Lowest latency for conversational and live apps. Direct delivery with tags and prosody. Every field, type, and default. `tts.convert` / `stream` / `stream_websocket`.