Create Voice Cloning Task
Start a voice cloning task. You can upload audio files directly or provide URLs to existing audio files.
Request Information
- Method:
POST - Endpoint:
/api/instant/create-task - Content Type:
multipart/form-dataorapplication/json
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
audio | File (binary) | Yes* | Audio file for voice cloning. Supported formats include WAV, MP3, and M4A. Only supports multipart/form-data format. You must provide either audio or audio_url. |
audio_url | string | Yes* | Publicly accessible audio file URL (WAV, MP3, M4A). Supports both multipart/form-data and application/json formats. You must provide either audio or audio_url. |
text | string | Yes | The text you want to synthesize with the cloned voice. |
api_key | string | Yes | Your unique API key for authentication and access. This key is used to verify your request and link it to your user account. |
model | string | No | Voice cloning model version. Options: v1, v2, v-mul. Default is v2. Different models vary in audio quality, processing speed, and multilingual support. |
speed_ratio | float | No | Speed ratio, range 0.5-2.0, default is 1.0 |
pitch_ratio | float | No | Pitch offset, range -10 to 10 semitones, default is 0 |
volume_ratio | float | No | Volume ratio, range 0.1-2.0, default is 1.0 |
emotion_control | object | No | Emotion control parameters (V2 model only). Supports multiple emotion control modes, see below for details |
*Note:
- Either
audiooraudio_urlparameter is required, at least one must be provided - When using
application/jsonformat, onlyaudio_urlparameter is supported - When using
multipart/form-dataformat, both parameters are supported
Response
Success Response
{
"task_id": "1406bf34-735c-4b21-98ac-a135b2afb1c8",
"status": "pending"
}Error Response
- 400 Bad Request: Missing required parameters (e.g.,
api_key, or neitheraudionoraudio_urlprovided)
Example Requests
Using Audio File
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-F "[email protected]" \
-F "text=This is a long text suitable for async interface processing..." \
-F "api_key=your_api_key_here"Using Audio URL
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-F "audio_url=https://example.com/sample.mp3" \
-F "text=This is a long text suitable for async interface processing..." \
-F "api_key=your_api_key_here"Using Specified Model
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-F "[email protected]" \
-F "text=This is a long text suitable for async interface processing..." \
-F "api_key=your_api_key_here" \
-F "model=v2"Using Audio Processing Parameters
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-F "[email protected]" \
-F "text=This is a long text suitable for async interface processing..." \
-F "api_key=your_api_key_here" \
-F "model=v2" \
-F "speed_ratio=1.2" \
-F "pitch_ratio=2" \
-F "volume_ratio=1.5"Using JSON Format (with audio_url)
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/sample.mp3",
"text": "This is a long text suitable for async interface processing...",
"api_key": "your_api_key_here",
"model": "v2",
"speed_ratio": 1.2,
"pitch_ratio": 2,
"volume_ratio": 1.5
}'Model Selection Guide
The API supports three voice cloning models that you can choose from based on your needs:
| Model | Description | Use Cases |
|---|---|---|
v2 | Second generation model (default), balances audio quality and processing speed | Recommended for most scenarios, provides high-quality voice cloning results |
v1 | First generation model, faster processing speed | Suitable for scenarios with high processing speed requirements |
v-mul | Multilingual model, supports cross-language voice cloning | Suitable for applications requiring multilingual support |
Note: If the model parameter is not specified, the system will use the v2 model by default.
Emotion Control Parameters (emotion_control)
V2 model only supports emotion control functionality through the emotion_control object. This can be passed directly in JSON format requests.
Emotion Control Modes
| Mode | mode Value | Description | Additional Parameters |
|---|---|---|---|
| Same as Reference | same_as_reference | Use the emotion from reference audio (default) | None |
| Reference Audio | reference_audio | Use specified reference audio to control emotion | reference_audio_url |
| Vector Control | vector | Use 8-dimensional emotion vector to control emotion | vector object |
| Text Description | text | Use text description to control emotion | text string |
| Random Emotion | random | Randomly generate emotion | None |
8-Dimensional Emotion Vector (vector mode)
When using vector mode, you can precisely control the intensity of 8 emotions, each value ranges from 0.0 to 1.0:
{
"joy": 0.5, // Joy (0.0-1.0)
"anger": 0.0, // Anger (0.0-1.0)
"sorrow": 0.0, // Sorrow (0.0-1.0)
"fear": 0.0, // Fear (0.0-1.0)
"excitement": 0.3, // Excitement (0.0-1.0)
"depression": 0.0, // Depression (0.0-1.0)
"surprise": 0.2, // Surprise (0.0-1.0)
"calm": 0.0 // Calm (0.0-1.0)
}Emotion Control Examples
Using Vector Control (Happy + Excited)
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/sample.mp3",
"text": "What a beautiful day!",
"api_key": "your_api_key_here",
"model": "v2",
"emotion_control": {
"mode": "vector",
"vector": {
"joy": 0.8,
"excitement": 0.6,
"surprise": 0.3,
"calm": 0.2
}
}
}'Using Text Description Control
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/sample.mp3",
"text": "This is absolutely amazing!",
"api_key": "your_api_key_here",
"model": "v2",
"emotion_control": {
"mode": "text",
"text": "excited and happy"
}
}'Using Reference Audio Control
curl -X POST https://aivoiceclonefree.com/api/instant/create-task \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/sample.mp3",
"text": "I feel very happy",
"api_key": "your_api_key_here",
"model": "v2",
"emotion_control": {
"mode": "reference_audio",
"reference_audio_url": "https://example.com/emotion-ref.wav"
}
}'Task Status Description
After creating a task, you will receive a task_id and initial status pending. Possible status values:
| Status | Description |
|---|---|
pending | Task submitted, waiting for processing |
processing | Task is being processed |
completed | Task completed |
failed | Task processing failed |
Next Steps
After successful task creation:
- Save the returned
task_id - Use Task Status Query to monitor progress
- Download audio via Get Task Result after completion
Best Practices
- Text Length: Recommended single task text length should not exceed 10,000 characters
- Audio Quality: Use high-quality audio samples for better cloning results
- Request Format:
- Using
multipart/form-data: Can directly upload audio file (audioparameter) or use audio URL (audio_urlparameter) - Using
application/json: Can only use audio URL (audio_urlparameter)
- Using
- API Limits: Be aware of API call frequency limits, avoid overly frequent requests
Last updated on