Affordable Access

Leveraging 3D Information for Controllable and Interpretable Image Synthesis

  • Raj, Amit
Publication Date
Dec 13, 2022
Scholarly Materials And Research @ Georgia Tech


Neural image synthesis has seen enormous advances in recent years, led by innovations in GANs which generate high-resolution, photo-realistic images. However, a major limitation of these methods is that they tend to capture texture statistics of an image with no explicit understanding of geometry. Additionally, GAN-only pipelines are notoriously hard to train. In contrast, recent trends in neural and volumetric rendering have demonstrated compelling results by incorporating 3D information into the synthesis pipeline using classical rendering techniques. We leverage ideas from both classical graphics rendering and neural image synthesis to design 3D guided image generation pipelines that are photo-realistic, controllable, and easy to train. In this thesis, we discuss three sets of models that incorporate geometric information for controllable image synthesis. 1. Static geometries: We leverage class specific shape priors to present generative models that allow for 3D consistent novel view synthesis. To that end, we propose the first framework that allows for generalization of implicit representations to novel identities in the context of facial avatars. 2. Articulated Geometries: In the second section, we extend controllable synthesis to articulated geometries. We present two frameworks (with explicit and implicit geometric representations) for synthesis of pose and viewpoint controllable full body digital avatars. 3. Scenes: In the final section we present a framework for generation of driving scenes with both static and dynamic elements. In particular, the proposed model allows fine grained control over local elements of the scene without needing to resynthesize the entire scene, which we posit should reduce both the memory footprint of the model and inference times. / Ph.D.

Report this publication


Seen <100 times