Lai, Wen-Hsing Wang, Siou-Lin
Published in
EURASIP Journal on Audio, Speech, and Music Processing
In this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by postprocessing, including median filter, morphology, and high-pass filter, to decompose the mixture. Subsequently, a deep recurrent neural n...
Zieliński, Sławomir K. Antoniuk, Paweł Lee, Hyunkook Johnson, Dale
Published in
EURASIP Journal on Audio, Speech, and Music Processing
One of the greatest challenges in the development of binaural machine audition systems is the disambiguation between front and back audio sources, particularly in complex spatial audio scenes. The goal of this work was to develop a method for discriminating between front and back located ensembles in binaural recordings of music. To this end, 22, 4...
Qin, Siqing Wang, Longbiao Li, Sheng Dang, Jianwu Pan, Lixin
Published in
EURASIP Journal on Audio, Speech, and Music Processing
Conventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource language, the current ASR is still challenging. The Lhasa dialect is the most widespread Tibetan dialect and has a wealth of speakers and transcrip...
Janský, Jakub Koldovský, Zbyněk Málek, Jiří Kounovský, Tomáš Čmejla, Jaroslav
Published in
EURASIP Journal on Audio, Speech, and Music Processing
In this paper, we propose a novel algorithm for blind source extraction (BSE) of a moving acoustic source recorded by multiple microphones. The algorithm is based on independent vector extraction (IVE) where the contrast function is optimized using the auxiliary function-based technique and where the recently proposed constant separating vector (CS...
Yao, Jiacheng Zhang, Jing Li, Jiafeng Zhuo, Li
Published in
EURASIP Journal on Audio, Speech, and Music Processing
With the sharp booming of online live streaming platforms, some anchors seek profits and accumulate popularity by mixing inappropriate content into live programs. After being blacklisted, these anchors even forged their identities to change the platform to continue live, causing great harm to the network environment. Therefore, we propose an anchor...
Takashima, Yuki Takashima, Ryoichi Tsunoda, Ryota Aihara, Ryo Takiguchi, Tetsuya Ariki, Yasuo Motoyama, Nobuaki
Published in
EURASIP Journal on Audio, Speech, and Music Processing
We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain...
Schwartz, Ofer Gannot, Sharon
Published in
EURASIP Journal on Audio, Speech, and Music Processing
The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is proposed: (1) multi-speaker direction of arrival (DOA) estimation and (2) multi-speaker relative transfer function (RTF) estimation. The DOA e...
Byambadorj, Zolzaya Nishimura, Ryota Ayush, Altangerel Ohta, Kengo Kitaoka, Norihide
Published in
EURASIP Journal on Audio, Speech, and Music Processing
Deep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech paired data for model training, and collecting this data is costly. Therefore, in this paper, we propose a single-speaker TTS system contain...
Luo, Yuancheng
Published in
EURASIP Journal on Audio, Speech, and Music Processing
Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in the array’s acoustic radiation pattern, spatial distribution of target and noise sources, and intended beampattern directivity. Relevant task...
Liu, Fangkun Wang, Hui Peng, Renhua Zheng, Chengshi Li, Xiaodong
Published in
EURASIP Journal on Audio, Speech, and Music Processing
Voice conversion is to transform a source speaker to the target one, while keeping the linguistic content unchanged. Recently, one-shot voice conversion gradually becomes a hot topic for its potentially wide range of applications, where it has the capability to convert the voice from any source speaker to any other target speaker even when both the...