AI Audio
Speech to Text
Transcribe audio to text (Whisper) — returns text, segments and language
Asynchronously transcribe audio to text, returning the full text, per-segment timeline, and detected
language. Poll the returned task_id after submitting.
Base URL:
https://api.aiclonevoicefree.com| Auth:Authorization: Bearer sk_...
POST /api/v2/voice/transcribe
| Field | Type | Required | Notes |
|---|---|---|---|
audio_url | string | ✅ | Audio URL (alias url also accepted) |
duration_seconds | number | ✅ | Audio length (s), used for per-second billing |
language | string | ⬜ | Language code; omit to auto-detect |
Billing
1 credit/second of audio (
cost = ceil(duration_seconds)). Pre-deducted with a balance check at submit (402if insufficient); auto-refunded on failure.
curl -X POST https://api.aiclonevoicefree.com/api/v2/voice/transcribe \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{ "audio_url": "https://your-cdn.com/speech.mp3", "duration_seconds": 42 }'Response 202 → { "task_id": "...", "status": "pending", "capability": "voice", "action": "transcribe" }
Getting the result
Poll GET /api/v2/voice/transcribe/{task_id} (note: transcription has its own status endpoint, not the unified /tasks):
{
"task_id": "...",
"status": "completed",
"capability": "voice",
"action": "transcribe",
"_type": "voice.transcribe",
"text": "full transcript...",
"language": "en",
"segments": [{ "start": 0.0, "end": 3.2, "text": "..." }]
}