Equivariant Volumetric Grasping

Pinhao Song1 Yutong Hu1 Pengteng Li2 Renaud Detry1
1KU Leuven
2HKUST(GZ)

Abstract

We propose a new volumetric grasp model that is equivariant to rotations around the vertical axis, leading to a significant improvement in sample efficiency. Our model employs a tri-plane volumetric feature representation---i.e., the projection of 3D features onto three canonical planes. We introduce a novel tri-plane feature design in which features on the horizontal plane are equivariant to 90° rotations, while the sum of features from the other two planes remains invariant to reflections induced by the same transformations. This design is enabled by a new deformable steerable convolution, which combines the adaptability of deformable convolutions with the rotational equivariance of steerable ones. This allows the receptive field to adapt to local object geometry while preserving equivariance properties. We further develop equivariant adaptations of two state-of-the-art volumetric grasp planners, GIGA and IGD. Specifically, we derive a new equivariant formulation of IGD's deformable attention mechanism and propose an equivariant generative model of grasp orientations based on flow matching. We provide a detailed analytical justification of the proposed equivariance properties and validate our approach through extensive simulated and real-world experiments. Our results demonstrate that the proposed projection-based design significantly reduces both computational and memory costs. Moreover, the equivariant grasp models built on top of our tri-plane features consistently outperform their non-equivariant counterparts, achieving higher performance with only a modest computational overhead.

demo for equivariance

Key Insight

Key Insight Diagram

This figure considers a workspace that contains a yellow cone and a blue box. As the workspace rotates by increments of 90°, the XY plane also rotates accordingly, but the XZ and YZ planes transform differently: Every time the workspace rotates 90°, the YZ plane becomes the previous XZ plane, and the XZ plane becomes a flipped copy of the previous YZ plane. This observation forms a key intuition of our paper. Let us consider a feature queried at the point located by the star in the figure. Observing the XZ and YZ planes, we see that, for all rotations of the scene, the query point always points at no object at all in one plane, and at the yellow cone on the other plane. Thus, a sum of matching features in the XZ and YZ planes is invariant to C4 transformation. Since the figure above lists C4 transformations exhaustively, this observation is in fact a general rule, which defines the tri-plane feature transformation under C4 group actions.

Equivariant Triplane UNet

Model Pipeline Diagram

EquiGIGA and EquiIGD

Model Pipeline Diagram

Real World Experiment Setup

Setup Diagram

Packed and Pile Scenes

Adversarial Scenes