Affordable Access

Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Multimodal Retrieval

Authors
  • Mroueh, Youssef
  • Marcheret, Etienne
  • Goel, Vaibhava
Type
Preprint
Publication Date
Feb 09, 2016
Submission Date
Nov 19, 2015
Identifiers
arXiv ID: 1511.06267
Source
arXiv
License
Yellow
External links

Abstract

Joint modeling of language and vision has been drawing increasing interest. A multimodal data representation allowing for bidirectional retrieval of images by sentences and vice versa is a key aspect. In this paper we present three contributions in canonical correlation analysis (CCA) based multimodal retrieval. Firstly, we show that an asymmetric weighting of the canonical weights, while achieving a cross-view mapping from the search to the query space, it improves the retrieval performance. Secondly, we devise a computationally efficient model selection - crucial to generalization and stability - in the framework of the Bjork Golub algorithm for regularized CCA via spectral filtering. Finally, we introduce a Hierarchical Kernel Sentence Embedding (HKSE) that approximates Kernel CCA for a special similarity kernel between words distributions. State of the art results are obtained on MSCOCO and Flickr benchmarks when these three techniques are used in conjunction.

Report this publication

Statistics

Seen <100 times