Affordable Access

Access to the full text

Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs

Authors
  • Koller, Oscar1
  • Zargaran, Sepehr1
  • Ney, Hermann1
  • Bowden, Richard2
  • 1 RWTH Aachen University, Human Language Technology and Pattern Recognition, Aachen, Germany , Aachen (Germany)
  • 2 University of Surrey, Centre for Vision Speech and Signal Processing, Guildford, UK , Guildford (United Kingdom)
Type
Published Article
Journal
International Journal of Computer Vision
Publisher
Springer-Verlag
Publication Date
Oct 05, 2018
Volume
126
Issue
12
Pages
1311–1325
Identifiers
DOI: 10.1007/s11263-018-1121-3
Source
Springer Nature
Keywords
License
Green

Abstract

This manuscript introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian framework. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15 and 38% relative reduction in word error rate and up to 20% absolute. We analyse the effect of the CNN structure, network pretraining and number of hidden states. We compare the hybrid modelling to a tandem approach and evaluate the gain of model combination.

Report this publication

Statistics

Seen <100 times