DPSNN: Spiking neural network for low-latency streaming speech enhancement

Sun, Tao; Bohte, Sander

doi:10.1088/2634-4386/ad93f9

2024-12-20

DPSNN: Spiking neural network for low-latency streaming speech enhancement

Neuromorphic Computing and Engineering , Volume 4 - Issue 4 p. 044008:1- 044008:14

Speech enhancement improves communication in noisy environments, affecting areas such as automatic speech recognition (ASR), hearing aids, and telecommunications. With these domains typically being power-constrained and event-based, and often requiring low latency, neuromorphic algorithms–particularly spiking neural networks (SNNs)–hold significant potential. However, current effective SNN solutions require a long temporal window to calculate Short Time Fourier Transforms (STFTs) and thus impose substantial latency, typically around 32 ms, which is too long for applications such as hearing aids. Inspired by the Dual-Path Recurrent Neural Network (DPRNN) in deep neural networks (DNNs), we develop a two-phase time-domain streaming SNN fframework for speech enhancement, named Dual-Path Spiking Neural Network (DPSNN). DPSNNs achieve low latency by replacing the STFT and inverse STFT (iSTFT) in traditional frequency-domain models with a learned convolutional encoder and decoder. In the DPSNN, the first phase uses Spiking Convolutional Neural Networks (SCNNs) to capture temporal contextual information, while the second phase uses Spiking Recurrent Neural Networks (SRNNs) to focus on frequency-related features. In addition, threshold-based activation suppression, along with L1 regularization loss, is applied to specific non-spiking layers in DPSNNs to further improve their energy efficiency. Evaluating on the Voice Cloning Toolkit (VCTK) Corpus and Intel N-DNS Challenge dataset, our approach demonstrates excellent performance in speech objective metrics, along with the very low latency (approximately 5 ms) required for applications like hearing aids.

Additional Metadata
Keywords	Spiking neural networks, Speech enhancement, Noise suppression, Low latency, Streaming
Persistent URL	doi.org/10.1088/2634-4386/ad93f9
Journal	Neuromorphic Computing and Engineering
Project	Human Brain Project - SGA3
Grant	This work was funded by the European Commission 7th Framework Programme; grant id h2020/945539 - Human Brain Project - SGA3 (HBP-SGA3)
Organisation	Machine Learning
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Sun, T.& Bohte, S. (2024). DPSNN: Spiking neural network for low-latency streaming speech enhancement. Neuromorphic Computing and Engineering, 4(4), 044008:1–044008:14.https://doi.org/10.1088/2634-4386/ad93f9

View at Publisher

Free Full Text ( Final Version , 5mb )

See Also
software\|data tao-sun/dpsnn T. Sun (Tao)

DPSNN: Spiking neural network for low-latency streaming speech enhancement

Publication

Publication

software|data
tao-sun/dpsnn

Address

CWI researchers

Questions or comments?

DPSNN: Spiking neural network for low-latency streaming speech enhancement

Publication

Publication

software|data tao-sun/dpsnn

Workflow

Workflow

Add Content

software|data
tao-sun/dpsnn