Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pretraining with phonetic or graphemic transcription, and self-supervised ...
Abstract: Speech Emotion Recognition (SER) in noisy environments is challenging due to the overlap between emotional and noise-related signals. We propose a novel emotion-diffusion approach to enhance ...
In today’s voice-first world, it’s not enough for systems to simply hear what users say. They need to understand it with precision. In high-stakes environments like healthcare, finance, or enterprise ...
This is the official repository 👑 for the WenetSpeech-Yue dataset and the source code for WenetSpeech-Pipe speech data preprocessing pipeline. To address the unique linguistic characteristics of ...
[2025.06.26] - This paper has been accepted by ICCV2025 🎉! [2025.02.13] - The benchmark and evaluation code are available! [2024.12.05] - The training dataset and generative dataset(v1: 0.43m and v2: ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results