Voice Convert

Voice convert (timbre transfer) = give a source audio (the spoken content to keep) + a reference audio (the target timbre), and get "the original content spoken in the target voice". Async task; poll for status after submitting.

`POST /api/v2/voice/convert`

Auth: Authorization: Bearer sk_...

Field	Type	Required	Notes
`source_audio_url`	string	✅	Original speech audio URL
`reference_audio_url`	string	✅	Target timbre reference URL
`diffusion_steps`	int	⬜	Inference steps — higher = better but slower
`length_adjust`	number	⬜	Duration scaling
`inference_cfg_rate`	number	⬜	CFG strength
`auto_f0_adjust`	bool	⬜	Auto pitch adjust
`pitch_shift`	int	⬜	Semitone shift
`return_format`	string	⬜	`wav` / `mp3`
`enable_separation`	bool	⬜	Separate vocals first
`vocals_gain`	number	⬜	Vocals gain
`accompaniment_gain`	number	⬜	Accompaniment gain
`metadata`	object	⬜	Passthrough fields

curl -X POST https://api.aiclonevoicefree.com/api/v2/voice/convert \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "source_audio_url": "https://oss.aiclonevoicefree.com/noise_removal/1751766508261_mandarin_speech_16kHz.wav",
    "reference_audio_url": "https://oss.aiclonevoicefree.com/trump.wav",
    "return_format": "wav"
  }'

Response 202 Accepted

{
  "task_id": "99ae1ea9-01e3-4b15-9f89-d35c280f44b1",
  "status": "pending",
  "capability": "voice",
  "action": "vocal-conversion",
  "model": "voice-convert",
  "created_at": 1780704377
}

Getting the result

This is an async task: with the task_id above, poll GET /api/v2/tasks/{task_id}; when status becomes completed, result holds the converted audio:

curl https://api.aiclonevoicefree.com/api/v2/tasks/99ae1ea9-01e3-4b15-9f89-d35c280f44b1 \
  -H "Authorization: Bearer sk_your_api_key"

{
  "status": "completed",
  "capability": "voice",
  "action": "vocal-conversion",
  "audioUrl": "https://oss.aiclonevoicefree.com/...converted.wav",
  "format": "wav",
  "degraded": false,
  "providerUsed": "voice-convert-1",
  "completed_at": 1780704999
}

See Tasks for the polling cadence, status values and the unified response format.

Billing

Charged by output audio duration: 2 credits per second (duration rounded up, min 1s).

Examples:

~30s output = 30 × 2 = 60 credits
~12.4s output = rounded up to 13s = 13 × 2 = 26 credits

Settled on completion, failures not charged (see Conventions). Uses voice credits.

POST /api/v2/voice/convert

Getting the result

Billing

On this page

`POST /api/v2/voice/convert`