Adapting neural representations of visual data for editing and generation

Elsner, Tim Kobbelt, Leif Shamir, Ariel

122310

Adapting neural representations of visual data for editing and generation RWTH Aachen University

Aachen

English 1 Online-Ressource : Illustrationen Learning and representing the structure of the underlying data is a central aspect of deep learning and ranges from tasks such as image classification to point cloud analysis from laser scanners. This thesis is therefore centred around two questions: First, does imposing the right structure on a visual data representation enable the generation of task-specific edits of an input, and second, can specifically shaping the structure of the latent space aid in learning the underlying data manifold as a whole. For content editing, we show how to learn a latent representation for geometry where the latent information is directly tied to points on the surface of an object to allow intuitive editing. We further demonstrate for images and 3D scenes how to edit the aspect ratio of an input through representing an applied deformation as neural field. As these editing approaches demonstrate, the right representation is crucial for achieving a task. For content generation, we demonstrate how a latent space shaped in spirit of the Gaussian pyramid can be beneficial for generating new images, but further increases the already high computational load. We use this as a motivation for more adaptive representations that require less computational power. We provide a strategy that extends Byte Pair Encoding to compress discretised visual data grids to facilitate generation, adaptively compressing regions of low information density into fewer tokens. We then propose a method that natively learns a representation that is globally adaptive as our final contribution. All of the proposed ideas are centred around controlling and shaping an underlying representation of the information, enabling editing and boosting generation through imposing the right structure on the representation. Veröffentlicht auf dem Publikationsserver der RWTH Aachen University ; Dissertation, RWTH Aachen University, 2025 ; 2, ; PUB:(DE-HGF)11, ;

Dissertation / PhD Thesis

Dissertation RWTH Aachen University

2025

RWTH-2025-09736

2025

https://publications.rwth-aachen.de/record/1021903