> ## Documentation Index > Fetch the complete documentation index at: https://hanabiaiinc-docs-platform-create-voice.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Speech to Text > Transcribe audio to text with per-segment timestamps Turn spoken audio into accurate text — with timed segments — using Fish Audio's ASR model. Send an audio file, get back the transcript, its duration, and timestamped segments. Works the same from the API directly, the Python library, or JavaScript. No code — upload audio, get a transcript. Every parameter for `POST /v1/asr`. Captions, batch transcription, and more. ## When to use it Timed segments map straight to SRT/VTT cues. Transcribe recordings for summaries and search. Turn short utterances into text your app can act on. Make audio and video content readable. ## Quick start Read an audio file, send the bytes, get the transcript. Choose your implementation: ```python Python theme={null} from fishaudio import FishAudio client = FishAudio() # reads FISH_API_KEY with open("speech.wav", "rb") as f: result = client.asr.transcribe(audio=f.read(), language="en") print(result.text) ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/asr \ --header "Authorization: Bearer $FISH_API_KEY" \ --form audio=@speech.wav \ --form language=en ``` ```javascript JavaScript theme={null} import { FishAudioClient } from "fish-audio"; import { readFile } from "fs/promises"; const client = new FishAudioClient({ apiKey: process.env.FISH_API_KEY }); const result = await client.speechToText.convert({ audio: new File([await readFile("speech.wav")], "speech.wav"), language: "en", }); console.log(result.text); ``` The response gives you the full `text`, the audio `duration` in seconds, and timed `segments`. ## Read the timestamps Each segment carries `start` and `end` times in seconds — ideal for captions. With the API, ask for them explicitly with `ignore_timestamps=false`. ```python Python theme={null} result = client.asr.transcribe(audio=audio_bytes, language="en", include_timestamps=True) print(f"{result.duration:.1f}s total") for seg in result.segments: print(f"[{seg.start:6.2f} - {seg.end:6.2f}] {seg.text}") ``` ```bash API (curl) theme={null} curl --request POST https://api.fish.audio/v1/asr \ --header "Authorization: Bearer $FISH_API_KEY" \ --form audio=@speech.wav \ --form language=en \ --form ignore_timestamps=false | jq '.segments' # Each segment: { "text": "One", "start": 0.0, "end": 0.24 } ``` In the Python SDK, segment timestamps are **on by default** — pass `include_timestamps=False` to skip them. That's the *inverse* of the API/JavaScript flag `ignore_timestamps`. ## Implementation details ### Language `language` is optional — Fish Audio auto-detects it when you omit it. Pass an ISO code (`en`, `zh`, `ja`, …) to pin it and improve accuracy on short or noisy clips. ```python Python theme={null} # Auto-detect result = client.asr.transcribe(audio=audio_bytes) # Pin the language result = client.asr.transcribe(audio=audio_bytes, language="zh") ``` ```bash API (curl) theme={null} # Omit the form field to auto-detect, or set it explicitly: curl --request POST https://api.fish.audio/v1/asr \ --header "Authorization: Bearer $FISH_API_KEY" \ --form audio=@speech.wav \ --form language=zh ``` ### Input audio Common formats work directly — `wav`, `mp3`, `opus`, and more. Send the raw file bytes; no pre-processing required. The endpoint accepts `multipart/form-data` (shown above) or `application/msgpack`. ### File limits One request transcribes one audio file. The endpoint accepts files up to **20 MB** and **60 minutes** long, with a minimum of **1 second** of audio. For longer recordings, split them into chunks and transcribe each, then stitch the segment timestamps back together (offset each chunk's `start`/`end` by where it began in the full recording). ### Async transcription The Python SDK ships an async client with the same surface — useful when you're transcribing many files concurrently or already running inside an event loop. Use `AsyncFishAudio` and `await` the call: ```python theme={null} import asyncio from fishaudio import AsyncFishAudio async def main(): client = AsyncFishAudio() # reads FISH_API_KEY with open("speech.wav", "rb") as f: result = await client.asr.transcribe(audio=f.read(), language="en") print(result.text) asyncio.run(main()) ``` To run several files in parallel, gather the coroutines: ```python theme={null} import asyncio from fishaudio import AsyncFishAudio async def transcribe_all(paths): client = AsyncFishAudio() clips = [open(p, "rb").read() for p in paths] return await asyncio.gather(*[ client.asr.transcribe(audio=clip, language="en") for clip in clips ]) for result in asyncio.run(transcribe_all(["speech.wav"])): print(result.text) ``` ### Direct API (MessagePack) `POST /v1/asr` also accepts a [MessagePack](https://msgpack.org) body instead of multipart form data — the same path the API reference links to for low-overhead, server-side calls. Pack the audio bytes and options into one payload and set `Content-Type: application/msgpack`: ```python theme={null} import os import httpx import ormsgpack with open("speech.wav", "rb") as f: audio = f.read() payload = {"audio": audio, "language": "en", "ignore_timestamps": False} resp = httpx.post( "https://api.fish.audio/v1/asr", content=ormsgpack.packb(payload), headers={ "Authorization": f"Bearer {os.environ['FISH_API_KEY']}", "Content-Type": "application/msgpack", }, ) result = resp.json() print(result["text"]) ``` The response shape is identical to the multipart path: `text`, `duration` (seconds), and `segments`. ## Going further The reverse direction — text to lifelike audio. Every field and the raw response schema. `asr.transcribe` options and the `ASRResponse` type.