We have hosted the application hifi gan in order to run this application in our online workstations with Wine or directly.


Quick description about hifi gan:

HiFi-GAN is a GAN-based neural vocoder designed to generate high-fidelity speech waveforms from mel spectrograms with exceptional efficiency. It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Features:
  • High-fidelity neural vocoder that converts mel spectrograms to waveforms using a GAN architecture
  • Multi-period and multi-scale discriminators to better capture periodicity and overall speech realism
  • Very fast inference, achieving far faster-than-real-time generation on modern GPUs and even optimized CPU setups
  • Multiple generator configurations (v1, v2, v3) to balance quality, speed, and model size
  • Compatible with many TTS front ends such as Tacotron2 and Glow-TTS for end-to-end systems
  • Open-source implementation with pretrained models and scripts for training, evaluation, and inference


Programming Language: Python.
Categories:
Text to Speech

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.