3D Gaussian Splatting (3DGS) is one of the fastest rendering methods in neural graphics, but the original algorithm assigns far too many tiles per Gaussian. This tile over-expansion is computationally expensive and unnecessary, because a large fraction of those tiles never contribute to the final image.
Speedy Splat (Hanson et al., 2024) solves this by replacing the oversized bounding box (AABB) with an analytically tight axis-aligned bounding box (SnugBox). This leads to a significant reduction in tile visits while preserving bit-identical rendering quality.
In the original 3DGS paper, each Gaussian is projected into screen space as an ellipse.
A conservative radius
r is used to approximate this ellipse with a circle, and then, a bounding box is computed around the circle.
The problem? The circle is much larger than the ellipse, causing many unnecessary tile assignments. This wastes compute during splatting.
While the standard 3D Gaussian Splatting strategy works fine, it produces too many tiles per Gaussian. Below is how the AABB bounding box is computed in the initial 3DGS paper. Here is the full PyTorch code.
# Tiling: baseline 3DGS tile computation
major_variance = evals[:, 1].clamp_min(1e-12).clamp_max(1e4) # [N]
radius = torch.ceil(3.0 * torch.sqrt(major_variance)).to(torch.int64)
umin = torch.floor(u - radius).to(torch.int64)
umax = torch.floor(u + radius).to(torch.int64)
vmin = torch.floor(v - radius).to(torch.int64)
vmax = torch.floor(v + radius).to(torch.int64)
Speedy Splat computes the exact axis-aligned bounding box (AABB) of the projected ellipse using basic algebra which yields a box that tightly wraps the ellipse.
# Tiling
A = inverse_covariance[:, 0, 0]
B = inverse_covariance[:, 0, 1]
C = inverse_covariance[:, 1, 1]
t = 2.0 * torch.log(255.0 * opacity) # Eq. 11
B2_minus_AC = (B ** 2 - A * C) # Eq. 16
xd_arg = torch.sqrt(- B * B * t / (B2_minus_AC * A))
xd_arg[B < 0] = - xd_arg[B < 0] # For Eq. 15 to be == 0, xd should be <0 when B<0
yd_arg = torch.sqrt(- B * B * t / (B2_minus_AC * C)) # Symmetry of Eq. 16
yd_arg[B < 0] = - yd_arg[B < 0]
# Substituting xd_args into Equation 15 and adding µ
vmax = v + (B * xd_arg + torch.sqrt((B ** 2 - A * C) * xd_arg ** 2 + t * C)) / C
vmin = v + (-B * xd_arg - torch.sqrt((B ** 2 - A * C) * xd_arg ** 2 + t * C)) / C
umax = u + (B * yd_arg + torch.sqrt((B ** 2 - A * C) * yd_arg ** 2 + t * A)) / A
umin = u + (-B * yd_arg - torch.sqrt((B ** 2 - A * C) * yd_arg ** 2 + t * A)) / A
umin = torch.floor(umin).to(torch.int64)
umax = torch.floor(umax).to(torch.int64)
vmin = torch.floor(vmin).to(torch.int64)
vmax = torch.floor(vmax).to(torch.int64)
This version reduces tile count dramatically and is fully equivalent in terms of rendered output.
Full implementation: Speedy Splat SnugBox PyTorch code
Note how easy it was to integrate the SnugBox logic into our codebase. Because our initial implementation is written entirely in PyTorch, we can modify the renderer directly without dealing with CUDA kernels, custom memory layouts, or compilation. Although the PyTorch version runs slower than a fully optimized CUDA pipeline, it achieves state-of-the-art rendering quality and—crucially—enables rapid experimentation. This makes it ideal for fast prototyping and testing new ideas with minimal overhead.
Do you want to truly understand 3D Gaussian Splatting—not just run a repo? My 3D Gaussian Splatting Course teaches you the full pipeline from first principles. Everything is broken down into clear modules with code you can actually read and modify.
Explore the Course →We help teams bridge the gap between research and production. Our work focuses on practical integration of 3D Gaussian Splatting techniques, implementation of recent methods, and custom research or prototyping for advanced splatting pipelines.
For consulting inquiries:
contact@qubitanalytics.be