Semantic Consistent Language Gaussian Splatting
for Point-Level Open-vocabulary Querying

Hairong Yin1, Huangying Zhan2, Yi Xu2, Raymond A. Yeh1
1Purdue University, 2Goertek Alpha Labs

Abstract

Open-vocabulary querying in 3D Gaussian Splatting aims to identify semantically relevant regions within a 3D Gaussian representation based on a given text query. Prior work, such as LangSplat, addressed this task by retrieving these regions in the form of segmentation masks on 2D renderings. More recently, OpenGaussian introduced point-level querying, which directly selects a subset of 3D Gaussians. In this work, we propose a point-level querying method that builds upon LangSplat's framework. Our approach improves the framework in two key ways: (a) we leverage masklets from the Segment Anything Model 2 (SAM2) to establish semantic consistent ground-truth for distilling the language Gaussians; (b) we introduces a novel two-step querying approach that first retrieves the distilled ground-truth and subsequently uses the ground-truth to query the individual Gaussians. Experimental evaluations on three benchmark datasets demonstrate that the proposed method achieves better performance compared to state-of-the-art approaches. For instance, our method achieves an mIoU improvement of +20.42 on the 3D-OVS dataset.

Method

Method Figure

We introduce a method for generating semantic and 3D-consistent ground-truth to train language-aware Gaussians. With this improved 3D language Gaussians, we propose an effective two-step querying process by leveraging the created consistent ground-truth.

Qualitative Results

Language Embeddings "GT" on Scene of Ramen in LERF (RGB, LangSplat, Ours)

Query Text: "napkin" (GaussianGrouping, LangSplat, OpenGaussian, Ours)




Language Embeddings "GT" on Scene of Figurines in LERF (RGB, LangSplat, Ours)

Query Text: "rubics cube" (GaussianGrouping, LangSplat, OpenGaussian, Ours)




Language Embeddings "GT" on Scene of Room in 3DOVS (RGB, LangSplat, Ours)

Query Text: "baseball" (GaussianGrouping, LangSplat, OpenGaussian, Ours)




Language Embeddings "GT" on Scene of Bed in 3DOVS (RGB, LangSplat, Ours)

Query Text: "red bag" (GaussianGrouping, LangSplat, OpenGaussian, Ours)

Quantitative Results

Lerf Tab Figure

#: Part of this work was done while the first three authors were with OPPO US Research Center.