Affordable Access

Pruning and compression of multi-view content for immersive video coding

Authors
  • Milovanovic, Marta
Publication Date
Jul 06, 2023
Source
HAL
Keywords
Language
English
License
Unknown
External links

Abstract

This thesis addresses the problem of efficient compression of immersive video content, represented with Multiview Video plus Depth (MVD) format. The Moving Picture Experts Group (MPEG) standard for the transmission of MVD data is called MPEG Immersive Video (MIV), which utilizes 2D video codecs to compress the source texture and depth information. Compared to traditional video coding, immersive video coding is more complex and constrained not only by trade-off between bitrate and quality, but also by the pixel rate. Because of that, MIV uses pruning to reduce the pixel rate and inter-view correlations and creates a mosaic of image pieces (patches). Decoder-side depth estimation (DSDE) has emerged as an alternative approach to improve the immersive video system by avoiding the transmission of depth maps and moving the depth estimation process to the decoder side. DSDE has been studied for the case of numerous fully transmitted views (without pruning). In this thesis, we demonstrate possible advances in immersive video coding, emphasized on pruning the input content. We go beyond DSDE and examine the distinct effect of patch-level depth restoration at the decoder side. We propose two approaches to incorporate decoder-side depth estimation (DSDE) on content pruned with MIV. The first approach excludes a subset of depth maps from the transmission, and the second approach uses the quality of depth patches estimated at the encoder side to distinguish between those that need to be transmitted and those that can be recovered at the decoder side. Our experiments show 4.63 BD-rate gain for Y-PSNR on average. Furthermore, we also explore the use of neural image-based rendering (IBR) techniques to enhance the quality of novel view synthesis and show that neural synthesis itself provides the information needed to prune the content. Our results show a good trade-off between pixel rate and synthesis quality, achieving the view synthesis improvements of 3.6 dB on average.

Report this publication

Statistics

Seen <100 times