Voice Clone Hub

New voice profile

Upload 6–30 seconds of clear audio (single speaker, no music). Stored in MySQL.

Select a dataset profile, upload multiple clips (3–30s each), then train speaker embedding.

Dataset profile

No samples yet

No profiles yet

No synthesis yet

Quick clone (yes): Zero-shot from one short sample — no training step.
Multi-sample training (yes): Upload several clips; XTTS builds a speaker embedding from all of them (CPU-friendly, not full model fine-tune).
GPU fine-tune (no): Full GPT fine-tuning needs a GPU and hours of compute.
Storage: Users, voice reference audio, synthesis output and audit logs are stored in MySQL.
Speed: ~1–3 minutes per short phrase on CPU.