AI Audio

Speech to Text

Transcribe audio to text (Whisper) — returns text, segments and language

Asynchronously transcribe audio to text, returning the full text, per-segment timeline, and detected language. Poll the returned task_id after submitting.

Base URL: https://api.aiclonevoicefree.com | Auth: Authorization: Bearer sk_...

POST /api/v2/voice/transcribe

FieldTypeRequiredNotes
audio_urlstringAudio URL (alias url also accepted)
duration_secondsnumberAudio length (s), used for per-second billing
languagestringLanguage code; omit to auto-detect

Billing

1 credit/second of audio (cost = ceil(duration_seconds)). Pre-deducted with a balance check at submit (402 if insufficient); auto-refunded on failure.

curl -X POST https://api.aiclonevoicefree.com/api/v2/voice/transcribe \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{ "audio_url": "https://your-cdn.com/speech.mp3", "duration_seconds": 42 }'

Response 202{ "task_id": "...", "status": "pending", "capability": "voice", "action": "transcribe" }

Getting the result

Poll GET /api/v2/voice/transcribe/{task_id} (note: transcription has its own status endpoint, not the unified /tasks):

{
  "task_id": "...",
  "status": "completed",
  "capability": "voice",
  "action": "transcribe",
  "_type": "voice.transcribe",
  "text": "full transcript...",
  "language": "en",
  "segments": [{ "start": 0.0, "end": 3.2, "text": "..." }]
}

On this page