Poster
in
Workshop: 5th Workshop on practical ML for limited/low resource settings (PML4LRS) @ ICLR 2024
Precision-Driven Low-Resource Speech Synthesis For Bangla Text-To-Speech System
Tabassum Shahjahan · Md. Ismail Hossain · Kazi Rafat · Mohamamd Ruhul Amin · Fuad Rahman · Nabeel Mohammed
Recent developments in deep learning and artificial intelligence have facilitated widespread commercial adoption of text-to-speech models that can produce intelligible and natural-sounding speech. Although numerous synthetic models are widely available for languages such as English, Chinese, etc., extremely low-resourced languages like Bangla continue to pose a formidable challenge for synthesizing speech data. In this paper, we adopt a single-stage and a two-stage training approach, followed by quantization techniques, to generate high-quality speech from Bangla dataset. Our experimental results show that the proposed models achieve both intelligibility and naturalness with reduced inference time even under extremely low settings. We are the first to provide a robust Bangla Text-To-speech system usable for both academic and commercial applications.