hifi gan

We have hosted the application hifi gan in order to run this application in our online workstations with Wine or directly.

Run hifi gan online

Quick description about hifi gan:

HiFi-GAN is a GAN-based neural vocoder designed to generate high-fidelity speech waveforms from mel spectrograms with exceptional efficiency. It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168� faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13� faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Features:

High-fidelity neural vocoder that converts mel spectrograms to waveforms using a GAN architecture
Multi-period and multi-scale discriminators to better capture periodicity and overall speech realism
Very fast inference, achieving far faster-than-real-time generation on modern GPUs and even optimized CPU setups
Multiple generator configurations (v1, v2, v3) to balance quality, speed, and model size
Compatible with many TTS front ends such as Tacotron2 and Glow-TTS for end-to-end systems
Open-source implementation with pretrained models and scripts for training, evaluation, and inference

Programming Language: Python.
Categories:

Text to Speech

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.