User Guide — Voice Cloning, TTS & Studio

Getting started

System requirements

ClonyVoice runs on Windows 10/11 (64-bit) and requires an NVIDIA graphics card with CUDA — voice generation runs on your own GPU, which is what makes it unlimited. As a rule of thumb: if your PC can run a recent 3D game, it can run ClonyVoice. macOS support is on our roadmap.

Installation

The installer downloads the AI models on first run, so the initial setup takes a few minutes.

Sign in to your account on clonyvoice.com and download the Windows installer
Run the .exe file and follow the setup wizard
Wait while the AI models are downloaded (a few minutes, one time only)
Launch ClonyVoice from your desktop or Start menu

Screenshot coming soon

Create your account

ClonyVoice uses a single account for everything: activation, your plan and your AI budget. The Free plan requires no payment method.

Create a free account on clonyvoice.com
Pick a plan — the Free plan is enough to clone a voice and try everything
Keep your account email at hand: it is what activates the app

Activate the app

Activation links this computer to your account. Depending on your plan you can use ClonyVoice on 1 (Free, Creator), 2 (Pro) or 3 (Studio) machines at the same time. Once activated, the app keeps working offline for up to 48 hours between license checks.

Launch ClonyVoice and open the License tab
Enter the email address of your ClonyVoice account
Click Activate — this computer is now linked to your account
To move to another computer later, click Disconnect in the app or release the machine from your online account

Screenshot coming soon

A quick tour of the interface

The app is organized in tabs: Text to Audio (speech generation and the editing timeline), Create a voice (cloning and voice design), Studio (turn a website into a video), VoiceStore (community voices), Projects, API and License. The gauge in the top bar shows your remaining AI budget at all times.

Screenshot coming soon

Your first cloned voice

Clone your voice in under 5 minutes

Quick mode needs about 30 seconds of clean speech. You can record it directly in the app with a guided reading text, or import an existing audio clip. Cloning runs entirely on your computer — the recording never leaves your machine.

Open the "Create a voice" tab and choose Quick mode
Click Record and read the on-screen text aloud (about 30 seconds), or import an audio clip
Pick the tone that matches the recording (neutral, happy, calm...)
Name your voice and click Clone
Your voice appears in the voice selector, ready to speak — test it right away

Screenshot coming soon

Precise mode: a cleaner clone

Precise mode also uses the transcript of your clips to align speech with audio. It takes a little more preparation and usually produces a more faithful clone. You can combine several clips.

Choose Precise mode and add one or more audio clips
Provide the transcript: type it, load a .txt/.srt/.vtt file, or let the built-in transcription fill it in
Check that the text matches what is actually spoken
Name the voice and launch the clone

Screenshot coming soon

Voice Design: create a voice from a description

No recording? Describe the voice you want — tone, age, accent, pace, style — and ClonyVoice generates a synthetic voice matching the description. A test panel lets you validate the result immediately.

Screenshot coming soon

Tips for a great clone

docs_fv_tips_text

Record in a quiet room, without music, echo or background noise
Speak at your natural pace — do not force a style you will not use later
Aim for 25 to 30 seconds of continuous speech; one single speaker only
For imported clips, prefer the original audio over compressed re-recordings
Not satisfied? Re-record and clone again — voice generation on your own machine is unlimited

Generate speech

Generate your first audio

Speech generation runs locally on your GPU: it is unlimited on every plan, including Free, and works with any of your voices.

Open the "Text to Audio" tab
Pick a voice: one of your clones, a built-in voice or a VoiceStore voice
Type or paste your text and select the language
Click Generate — long texts are split into sentences you can regenerate one by one
Listen, then export the audio or continue in the timeline

Screenshot coming soon

Dialogues with several voices

Assign a different voice to each line to produce dialogues and multi-character narrations in one pass. Each sentence can be regenerated individually without redoing the whole text.

Ten languages, one voice

The same cloned voice can speak English, French, German, Spanish, Italian, Portuguese, Russian, Japanese, Korean and Chinese. Select the language before generating; for full multilingual versions of a project, see Dubbing & translation.

Studio: from URL to video

What the Studio does

The Studio turns a web page into a marketing video: it reads the site, keeps its logo, images and colors, writes a script, records the voice-over with your cloned voice, and renders an MP4 — on your own machine. Script writing and scene planning run on our servers and draw on your monthly AI budget; the video itself is rendered locally on your GPU.

Your first video

From URL to finished MP4 in a few minutes.

Open the Studio tab and paste the address of the website you want a video for
Launch the analysis: ClonyVoice collects the site's logo, images and key content — you can add your own photos too
Choose the duration, the language and the voice for the voice-over
Start the generation: the script is written, then spoken with your voice
Every scene is timed on the real voice-over — the requested duration is respected
The MP4 renders locally on your GPU
Preview the result, download the video or fine-tune it in the editor

Screenshot coming soon

Edit the scenes

Every scene is editable: texts, images and their order. The scene editor is available in the Studio and from the timeline, so you can adjust a title or swap a photo without regenerating anything.

Screenshot coming soon

Ask for a revision

A revision rewrites the script according to your instructions. Sentences you keep are preserved exactly as they are — only changed text is re-recorded. Each video includes revisions at no extra charge; beyond that, a revision uses less of your AI budget than a full video.

Open the revision panel and describe what you want to change
ClonyVoice revises the script: untouched sentences stay identical, modified ones are re-spoken with your voice
The video re-renders locally with the new timing

Free plan: watermark and end card

Videos created on the Free plan are exported in 720p with a ClonyVoice watermark and a short "Created with ClonyVoice" end card. Paid plans export in 1080p with no marking. This is how the free tier stays genuinely free.

Editor & timeline

The timeline

The timeline assembles your generated audio, imported tracks, videos and images into one export. A Studio video arrives as a single block, alongside your own imports.

Generate or load audio — each sentence becomes a block you can move
Import extra tracks (music, sound effects) or video/images
Arrange and trim the blocks
Preview, then export the result

Screenshot coming soon

Subtitles

Word-level subtitles are generated from the actual audio. Numbers, prices and phone numbers are kept whole — they are never split across two lines.

Export

Exports are rendered on your machine.

Click Export and choose audio (WAV) or video (MP4)
Pick the destination in the save dialog
Free plan videos export in 720p with the ClonyVoice watermark; paid plans in 1080p without it

Dubbing & translation

How dubbing works

ClonyVoice translates your content into up to 10 languages and speaks each translation with your own cloned voice, aligned on the original timings. This is audio synchronization: the picture is not modified and lips are not re-animated.

Dub a project

Automatic translation runs online and uses your AI budget — the exact cost is displayed before you launch it. You can also paste your own translations at no cost.

Generate your project in its source language
Enable Multi-language and select the target languages
Click Auto-translate (cost shown beforehand), or paste your own translations batch by batch
Review each batch in its tab and adjust any line
Generate: each language is spoken locally with your cloned voice, aligned on the original timings
Export one file per language

Screenshot coming soon

VoiceStore

Browsing the Voice Store

The Voice Store is an online marketplace of voice models shared by the community. Browse, preview, and download voices for your projects.

Download a voice

Downloading VoiceStore voices is included from the Creator plan.

Open the VoiceStore tab
Browse or search, and listen to the previews
Click Download on the voice you want
The voice appears in your library, ready to use in generation and Studio

Screenshot coming soon

Publish your voice

You can share your own voices on the VoiceStore. Only publish voices you have the rights to.

Select one of your voices in your library
Choose Publish to the VoiceStore
Fill in the name, description and preview
Submit — your voice becomes available to the community

Projects

Projects and autosave

Everything you do lives in a project: generated audio, timeline, Studio video, translations. Projects save automatically as you work.

Open the Projects tab and create a project
Work normally — generation, Studio and the timeline all follow the current project
Changes are saved automatically as you go
Reopen any project later exactly where you left it

Screenshot coming soon

Back up and move your voices

From the Creator plan, voices can be exported as encrypted .clonyvoice packages and imported on another of your activated machines — handy for backups or a new computer.

AI budget & plans

Why an AI budget?

Voice cloning and speech generation run on your GPU: they are unlimited on every plan and cost us nothing to run. Your AI budget only covers what runs on our servers — Studio script and scene planning, and automatic translation. The app shows it as a simple 0-100% gauge, and each plan's monthly budget is roughly: Free ≈ 3 videos, Creator ≈ 30, Pro ≈ 60, Studio ≈ 120.

Read your gauge

The gauge in the top bar shows your remaining AI budget and its equivalent in videos. Click it for the details: monthly budget and renewal date.

Screenshot coming soon

When your budget runs out

On the Free plan, video creation pauses until the monthly renewal — local voice generation keeps working without limit. On paid plans, your creations are never blocked: at peak times they may start with a short delay in a priority queue. To get more before the monthly renewal, move up to a plan with a bigger AI budget.

Machines per plan

Free and Creator: 1 machine. Pro: 2. Studio: 3. Disconnect a machine from the app or from your online account to free a seat at any time.

Troubleshooting

GPU not detected

ClonyVoice requires an NVIDIA graphics card with CUDA. If the app reports no GPU:

Check that your machine has an NVIDIA GPU (Windows Task Manager, Performance tab)
Update the NVIDIA driver from nvidia.com, then reboot
Relaunch ClonyVoice and check the hardware status in the top bar
Still stuck? Contact support from the app with your system details

Generation is slow

docs_tr_slow_text

The first generation after launch loads the AI models — the next ones are much faster
Close other GPU-hungry applications (games, video editors)
Long texts are generated sentence by sentence — the queue indicator shows progress

The clone does not sound right

Re-record your sample in a quieter environment, or switch to Precise mode with a transcript. The quality of the source recording is the single biggest factor.

Working offline

Voice generation works offline for up to 48 hours between license checks; reconnect to the Internet to revalidate. The Studio and automatic translation always require a connection, because they run on our servers.

"Already activated on another machine"

Your plan's machine seats are all in use. Disconnect ClonyVoice on the other machine, or release it from your online account, then activate again.

Local API

Enable the local API

Pro and Studio plans include a local REST API served by the app itself on your machine (127.0.0.1:8765) — automate generation from your scripts and tools, with data staying on your computer.

Open the API tab in the app
Create an API key and select its scopes
Copy the key — it is shown only once
Call the API with the X-API-Key header, e.g. POST /api/generate/clone

Screenshot coming soon

Full reference

All endpoints, scopes and code samples are documented on the API documentation page.

Open the API documentation →

Documentation

Getting started

System requirements

Installation

Create your account

Activate the app

A quick tour of the interface

Your first cloned voice

Clone your voice in under 5 minutes

Precise mode: a cleaner clone

Voice Design: create a voice from a description

Tips for a great clone

Generate speech

Generate your first audio

Dialogues with several voices

Ten languages, one voice

Studio: from URL to video

What the Studio does

Your first video

Edit the scenes

Ask for a revision

Free plan: watermark and end card

Editor & timeline

The timeline

Subtitles

Export

Dubbing & translation

How dubbing works

Dub a project

VoiceStore

Browsing the Voice Store

Download a voice

Publish your voice

Projects

Projects and autosave

Back up and move your voices

AI budget & plans

Why an AI budget?

Read your gauge

When your budget runs out

Machines per plan

Troubleshooting

GPU not detected

Generation is slow

The clone does not sound right

Working offline

"Already activated on another machine"

Local API

Enable the local API

Full reference