Multi-view surface reconstruction, the task of recovering accurate surfaces from multi-view images, has shifted in recent years from classical shape-from-X pipelines that rely on handcrafted feature matching, to fast, data-driven stereo/multi-view stereo correspondence models and emerging 3D foundation models. In parallel, implicit 3D representations such as Neural Radiance Fields and Gaussian Splatting (GS) have revolutionized novel view synthesis, with GS in particular achieving incredible speed and rendering quality. However, extracting reliable geometry from an appearance-only optimized GS representation is a challenging task. Prior works attempt to inject heuristic and/or data-driven geometric priors during the GS optimization phase, often resulting in a tradeoff between rendering quality and geometric accuracy.
We propose to avoid this tradeoff, and introduce GS2Mesh, a novel method for incorporating data-driven priors into GS, in a manner that not only avoids damaging the rendering quality, but actually uses the rendering quality to improve the geometric quality. We compose a novel pipeline, in which a fully optimized GS representation is used to manipulate the original multi-view monocular input, into a multi-view consistent stereoscopic input, from which a pre-trained data-driven stereo model can extract accurate geometry. Our pipeline is model-agnostic, and can naturally improve as newer GS and stereo models emerge.
We show how GS and stereo work well with each other, and demonstrate our method’s state-of-the-art performance both in speed and accuracy on popular 3D reconstruction benchmarks, as well as on in-the-wild videos taken by a standard smartphone camera.