tacotron 2

We have hosted the application tacotron 2 in order to run this application in our online workstations with Wine or directly.

Run tacotron 2 online

Quick description about tacotron 2:

Tacotron-2 is a TensorFlow implementation of DeepMind�s Tacotron-2 end-to-end text-to-speech architecture, which predicts mel spectrograms from raw text and then feeds them to a neural vocoder such as WaveNet. It reproduces the original paper�s hyperparameters exactly via paper_hparams.py, while also offering a tuned hparams.py with extra improvements that often yield better audio quality in practice. The repository is structured as a full training pipeline: dataset preparation, preprocessing into spectrograms, Tacotron training, WaveNet (or Griffin-Lim) vocoder training, and final waveform synthesis. It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.

Features:

Full TensorFlow implementation of Tacotron-2 with paper-accurate and enhanced hyperparameter sets
End-to-end pipeline from raw audio datasets (e.g., LJSpeech, M-AILABS) through preprocessing, Tacotron training, and vocoder training
Support for both WaveNet vocoder and Griffin-Lim inversion for mel-to-waveform synthesis
Detailed repository structure with logs for mel-spectrograms, attention plots, evaluation audio, and vocoder outputs
Modular training scripts (preprocess.py, train.py, synthesize.py, wavenet_preprocess.py) for flexible experimentation
Example configurations that replicate the original paper results and variants that push for improved stability and quality

Programming Language: Python.
Categories:

Text to Speech

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.