Affordable Access

Publisher Website

CSLNSpeech: Solving the extended speech separation problem with the help of Chinese sign language

Authors
  • Wu, Jiasong
  • Li, Xuan
  • Li, Taotao
  • Meng, Fanman
  • Kong, Youyong
  • Yang, Guanyu
  • Senhadji, Lotfi
  • Shu, Huazhong
Publication Date
Nov 01, 2024
Identifiers
DOI: 10.1016/j.specom.2024.103131
OAI: oai:HAL:hal-04719302v1
Source
HAL-Rennes 1
Keywords
Language
English
License
Unknown
External links

Abstract

Previous audio-visual speech separation methods synchronize the speaker's facial movement and speech in the video to self-supervise the speech separation. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network to learn the combination of three modalities, audio, face, and sign language information, to solve the speech separation problem better. We introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset to train the model, in which three modalities coexist: audio, face, and sign language. Experimental results show that the proposed model performs better and is more robust than the usual audio-visual system. In addition, the sign language modality can also be used alone to supervise speech separation tasks, and introducing sign language helps hearing-impaired people learn and communicate. Last, our model is a general speech separation framework and can achieve very competitive separation performance on two open-source audio-visual datasets. The code is available at https://github.com/ iveveive/SLNSpeech

Report this publication

Statistics

Seen <100 times