NeRF Papers for Tue May 21 2024

Use the forward/backward buttons at the top left to move between days. Use the home button to jump to latest day. Use the arrow keys on the keyboard or in the bottom right to move between categories and papers. ESC enters an overview mode. On mobile touch inputs with swipes are enabled.

Priors and Generative

Priors can either aid in the reconstruction or can be used in a generative manner. For example, in the reconstruction, priors either increase the quality of neural view synthesis or enable reconstructions from sparse image collections.

HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

Qifeng Chen, Rengan Xie, Kai Huang, Qi Wang, Wenting Zheng, Rong Li, Yuchi Huo
Figure 1: Given a monocular video of a performer, our method reconstructs a digital avatar equipped with high-quality triangular mesh and high-resolution corresponding PBR material textures. The result is compatible with standard graphics engines and can be edited.
The framework enables the generation of high-resolution human avatars with detailed materials and geometry suitable for traditional graphics engines. It combines monocular video information and virtual multi-view image synthesis to reconstruct deformable neural implicit surfaces and extract triangle meshes, resulting in high-fidelity avatars compatible with common renderers.

Fundamentals

These papers address more fundamental problems of view-synthesis with NeRF methods.

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan
Figure 2: ISM example . We notice that using the same initial value x0 but under different noise {1, 2, 3, 4}, the generated results still show certain inconsistencies. This is due to the error accumulation inherent in the DDIM inversion process. These inconsistencies can lead to errors or inconsistencies in some areas during the optimization of the 3D model.
The Trajectory Score Matching (TSM) method aims to solve the pseudo ground truth inconsistency issue in Denoising Diffusion Implicit Models (DDIM) inversion process caused by accumulated errors in Interval Score Matching (ISM). By leveraging the inversion process of DDIM to generate two paths from the same starting point for calculation, TSM reduces accumulated errors, enhancing stability and consistency during the distillation process. Experimental results show TSM outperforms ISM and proposes a pixel-by-pixel gradient clipping method to address unstable gradients during the 3D Gaussian splatting process when using Stable Diffusion XL.

Decomposition

In this section, the radiance of NeRF is split into geometry, BRDF, and illumination. This enables consistent relighting under any illumination.
Fig. 1. 3D Gaussian Splatting workflow, image from (Kerbl et al., 2023). SfM is used to create a sparse point cloud to initialize the 3D Gaussian Splatting model. From these 3D Gaussians, new images are generated via the rasterizer and compared to ground truth images during optimization. Gaussians are densified as required.
In this study, a 3D Gaussian Splatting model of the Waterloo region was constructed using Google Earth imagery for view synthesis and scene reconstruction. The results of this model exceeded previous 3D view-synthesis results based on neural radiance fields and were benchmarked against a Multi-View-Stereo dense reconstruction, demonstrating success in reconstructing 3D geometry and photorealistic lighting of the urban scene.

Pose Estimation

Estimating the pose of objects or the camera is a fundamental problem in computer vision. This can also be done to improve the quality of scenes with noisy camera poses.

MotionGS : Compact Gaussian Splatting SLAM by Motion Filter

Xinli Guo, Peng Han, Weidong Zhang, Hongtian Chen
Figure 1: Overview of MotionGS. The input to MotionGS at each timestep is the current RGB-D/RGB image. After the motion filter, the directly pose optimization of the motion keyframe is done based on the photometric error between the GT and render result. After information filter, the joint optimization of keyframe poses and 3D scene geometry on sliding windows and random historical frames is carried out in the mapping thread. Finally, the scene is refined.
A novel 3D Gaussian Splatting (3DGS)-based SLAM approach is introduced, focusing on feature extraction and motion filtering for selective tracking. This method incorporates deep visual features, dual keyframe selection, and 3DGS for more efficient pose estimation and scene representation. The results show improved tracking and mapping performance compared to existing methods, with lower memory usage.