Accelerating 3D Gaussian Splatting With Speedy Splat

3D Gaussian Splatting (3DGS) is one of the fastest rendering methods in neural graphics, but the original algorithm assigns far too many tiles per Gaussian. This tile over-expansion is computationally expensive and unnecessary, because a large fraction of those tiles never contribute to the final image.

Speedy Splat (Hanson et al., 2024) solves this by replacing the oversized bounding box (AABB) with an analytically tight axis-aligned bounding box (SnugBox). This leads to a significant reduction in tile visits while preserving bit-identical rendering quality.

Why Standard 3DGS Over-Assigns Tiles

In the original 3DGS paper, each Gaussian is projected into screen space as an ellipse. A conservative radius r is used to approximate this ellipse with a circle, and then, a bounding box is computed around the circle.

The problem? The circle is much larger than the ellipse, causing many unnecessary tile assignments. This wastes compute during splatting.

Comparison of original 3DGS bounding circle and Speedy Splat SnugBox

Standard 3DGS Tile Assignment Code

While the standard 3D Gaussian Splatting strategy works fine, it produces too many tiles per Gaussian. Below is how the AABB bounding box is computed in the initial 3DGS paper. Here is the full PyTorch code.

# Tiling: baseline 3DGS tile computation
major_variance = evals[:, 1].clamp_min(1e-12).clamp_max(1e4)  # [N]
radius = torch.ceil(3.0 * torch.sqrt(major_variance)).to(torch.int64)
umin = torch.floor(u - radius).to(torch.int64)
umax = torch.floor(u + radius).to(torch.int64)
vmin = torch.floor(v - radius).to(torch.int64)
vmax = torch.floor(v + radius).to(torch.int64)

Speedy Splat: The SnugBox Replacement

Speedy Splat computes the exact axis-aligned bounding box (AABB) of the projected ellipse using basic algebra which yields a box that tightly wraps the ellipse.

SnugBox Implementation in PyTorch

# Tiling
A = inverse_covariance[:, 0, 0]
B = inverse_covariance[:, 0, 1]
C = inverse_covariance[:, 1, 1]
t = 2.0 * torch.log(255.0 * opacity)  # Eq. 11
B2_minus_AC = (B ** 2 - A * C)  # Eq. 16
xd_arg = torch.sqrt(- B * B * t / (B2_minus_AC * A))
xd_arg[B < 0] = - xd_arg[B < 0]  # For Eq. 15 to be == 0, xd should be <0 when B<0
yd_arg = torch.sqrt(- B * B * t / (B2_minus_AC * C))   # Symmetry of Eq. 16
yd_arg[B < 0] = - yd_arg[B < 0]

# Substituting xd_args into Equation 15 and adding µ
vmax = v + (B * xd_arg + torch.sqrt((B ** 2 - A * C) * xd_arg ** 2 + t * C)) / C
vmin = v + (-B * xd_arg - torch.sqrt((B ** 2 - A * C) * xd_arg ** 2 + t * C)) / C
umax = u + (B * yd_arg + torch.sqrt((B ** 2 - A * C) * yd_arg ** 2 + t * A)) / A
umin = u + (-B * yd_arg - torch.sqrt((B ** 2 - A * C) * yd_arg ** 2 + t * A)) / A
umin = torch.floor(umin).to(torch.int64)
umax = torch.floor(umax).to(torch.int64)
vmin = torch.floor(vmin).to(torch.int64)
vmax = torch.floor(vmax).to(torch.int64)

This version reduces tile count dramatically and is fully equivalent in terms of rendered output.

Full implementation: Speedy Splat SnugBox PyTorch code

Why This Works So Well

Same rendered results — the ellipse is exactly enclosed.
Fewer tiles per Gaussian → faster rendering.
Mathematically simple, easy to integrate.

Fully programmable code

Note how easy it was to integrate the SnugBox logic into our codebase. Because our initial implementation is written entirely in PyTorch, we can modify the renderer directly without dealing with CUDA kernels, custom memory layouts, or compilation. Although the PyTorch version runs slower than a fully optimized CUDA pipeline, it achieves state-of-the-art rendering quality and—crucially—enables rapid experimentation. This makes it ideal for fast prototyping and testing new ideas with minimal overhead.

Learn 3DGS Step-By-Step

📘 Master 3D Gaussian Splatting

Do you want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches you the full pipeline from first principles. Everything is broken down into clear modules with code you can actually read and modify.

Explore the Course →

Consulting

💼 Research & Engineering Consulting

We help teams bridge the gap between research and production. Our work focuses on practical integration of 3D Gaussian Splatting techniques, implementation of recent methods, and custom research or prototyping for advanced splatting pipelines.

For consulting inquiries:
contact@qubitanalytics.be