2024-12-20
DPSNN: Spiking neural network for low-latency streaming speech enhancement
Publication
Publication
Neuromorphic Computing and Engineering , Volume 4 - Issue 4 p. 044008:1- 044008:14
Speech enhancement improves communication in noisy environments, affecting areas such as automatic speech recognition (ASR), hearing aids, and telecommunications. With these domains typically being power-constrained and event-based, and often requiring low latency, neuromorphic algorithms–particularly spiking neural networks (SNNs)–hold significant potential. However, current effective SNN solutions require a long temporal window to calculate Short Time Fourier Transforms (STFTs) and thus impose substantial latency, typically around 32 ms, which is too long for applications such as hearing aids. Inspired by the Dual-Path Recurrent Neural Network (DPRNN) in deep neural networks (DNNs), we develop a two-phase time-domain streaming SNN fframework for speech enhancement, named Dual-Path Spiking Neural Network (DPSNN). DPSNNs achieve low latency by replacing the STFT and inverse STFT (iSTFT) in traditional frequency-domain models with a learned convolutional encoder and decoder. In the DPSNN, the first phase uses Spiking Convolutional Neural Networks (SCNNs) to capture temporal contextual information, while the second phase uses Spiking Recurrent Neural Networks (SRNNs) to focus on frequency-related features. In addition, threshold-based activation suppression, along with L1 regularization loss, is applied to specific non-spiking layers in DPSNNs to further improve their energy efficiency. Evaluating on the Voice Cloning Toolkit (VCTK) Corpus and Intel N-DNS Challenge dataset, our approach demonstrates excellent performance in speech objective metrics, along with the very low latency (approximately 5 ms) required for applications like hearing aids.
Additional Metadata | |
---|---|
, , , , | |
doi.org/10.1088/2634-4386/ad93f9 | |
Neuromorphic Computing and Engineering | |
Human Brain Project - SGA3 | |
Organisation | Machine Learning |
Sun, T., & Bohte, S. (2024). DPSNN: Spiking neural network for low-latency streaming speech enhancement. Neuromorphic Computing and Engineering, 4(4), 044008:1–044008:14. doi:10.1088/2634-4386/ad93f9 |