New voice profile
Upload 6–30 seconds of clear audio (single speaker, no music). Stored in MySQL.
Train from uploaded samples
Select a dataset profile, upload multiple clips (3–30s each), then train speaker embedding.
No samples yet
Synthesize speech
Voice profiles
No profiles yet
Synthesis history
No synthesis yet
Server capabilities
- Quick clone (yes): Zero-shot from one short sample — no training step.
- Multi-sample training (yes): Upload several clips; XTTS builds a speaker embedding from all of them (CPU-friendly, not full model fine-tune).
- GPU fine-tune (no): Full GPT fine-tuning needs a GPU and hours of compute.
- Storage: Users, voice reference audio, synthesis output and audit logs are stored in MySQL.
- Speed: ~1–3 minutes per short phrase on CPU.