Poster
in
Workshop: Geometry-grounded Representation Learning and Generative Modeling

Tensor-SAE: Structured Sparse Autoencoders for Interpretable and Efficient Image Representations

Tanush Shastry ⋅ Soham Batra ⋅ Laksh Patel ⋅ Aarav Lala ⋅ Andrew Bae ⋅ Siddharth Karuturi ⋅ Mithil Shah ⋅ Neel Shanbhag

Project Page [ OpenReview]

Abstract

We introduce Tensor-SAE, a structured sparse autoencoder that decodes through a learned bank of rank-1 tensor atoms (color × height × width). By factorizing the decoder into separable color and spatial factors and applying a light sparsity prior on latent activations, Tensor-SAE induces compact, interpretable representations that enable linear, spatially localized, and semantically meaningful interventions in image reconstructions. Unlike unconstrained dense or convolutional decoders that distribute information diffusely, Tensor-SAE enforces a strong inductive bias that trades some raw pixel-level fidelity for computational efficiency, interpretability, and controllability. We evaluate Tensor-SAE on CIFAR-10 against two baselines (a parameter-matched Dense-SAE and a ConvAE scaled to match parameter budgets). Our empirical suite (six figures) demonstrates that Tensor-SAE: (1) learns low-entropy spatial atoms and clean color factors; (2) yields linearly predictable intervention effects (R2 ≈ 0.93) enabling controllable color edits; (3) achieves superior reconstruction efficiency per FLOP and per parameter; (4) produces consistently sparse latents; and (5) stabilizes intervention strength during training. We discuss trade-offs, limitations, and the application of Tensor-SAE as a building block for interpretable, compute-efficient generative systems.

Chat is not available.