Rogez, Grégory Schmid, Cordelia
Published in
International Journal of Computer Vision
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3...
Duong, Chi Nhan Luu, Khoa Quach, Kha Gia Bui, Tien D.
Published in
International Journal of Computer Vision
The “interpretation through synthesis” approach to analyze face images, particularly Active Appearance Models (AAMs) method, has become one of the most successful face modeling approaches over the last two decades. AAM models have ability to represent face images through synthesis using a controllable parameterized Principal Component Analysis (PCA...
Zhang, Dingwen Han, Junwei Zhao, Long Meng, Deyu
Published in
International Journal of Computer Vision
Weakly supervised object detection is an interesting yet challenging research topic in computer vision community, which aims at learning object models to localize and detect the corresponding objects of interest only under the supervision of image-level annotation. For addressing this problem, this paper establishes a novel weakly supervised learni...
Cherian, Anoop Gould, Stephen
Published in
International Journal of Computer Vision
Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the bene...
Ruder, Manuel Dosovitskiy, Alexey Brox, Thomas
Published in
International Journal of Computer Vision
Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the ori...
Wang, Hanxiao Zhu, Xiatian Gong, Shaogang Xiang, Tao
Published in
International Journal of Computer Vision
Most existing person re-identification (re-id) methods are unsuitable for real-world deployment due to two reasons: Unscalability to large population size, and Inadaptability over time. In this work, we present a unified solution to address both problems. Specifically, we propose to construct an identity regression space (IRS) based on embedding di...
Yang, Yingzhen Feng, Jiashi Jojic, Nebojsa Yang, Jianchao Huang, Thomas S.
Published in
International Journal of Computer Vision
Subspace clustering methods partition the data that lie in or close to a union of subspaces in accordance with the subspace structure. Such methods with sparsity prior, such as sparse subspace clustering (SSC) (Elhamifar and Vidal in IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781, 2013) with the sparsity induced by the ℓ1\documentclass[12pt]{...
Owens, Andrew Wu, Jiajun McDermott, Josh H. Freeman, William T. Torralba, Antonio
Published in
International Journal of Computer Vision
The sound of crashing waves, the roar of fast-moving cars—sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound asso...
Gaidon, Adrien Lopez, Antonio Perronnin, Florent
Published in
International Journal of Computer Vision
Munda, Gottfried Reinbacher, Christian Pock, Thomas
Published in
International Journal of Computer Vision
Event cameras or neuromorphic cameras mimic the human perception system as they measure the per-pixel intensity change rather than the actual intensity level. In contrast to traditional cameras, such cameras capture new information about the scene at MHz frequency in the form of sparse events. The high temporal resolution comes at the cost of losin...