Tensor-SAE: Structured Sparse Autoencoders for Interpretable and Efficient Image Representations
Abstract
We introduce Tensor-SAE, a structured sparse autoencoder that decodes through a learned bank of rank-1 tensor atoms (color × height × width). By factorizing the decoder into separable color and spatial factors and applying a light sparsity prior on latent activations, Tensor-SAE induces compact, interpretable representations that enable linear, spatially localized, and semantically meaningful interventions in image reconstructions. Unlike unconstrained dense or convolutional decoders that distribute information diffusely, Tensor-SAE enforces a strong inductive bias that trades some raw pixel-level fidelity for computational efficiency, interpretability, and controllability. We evaluate Tensor-SAE on CIFAR-10 against two baselines (a parameter-matched Dense-SAE and a ConvAE scaled to match parameter budgets). Our empirical suite (six figures) demonstrates that Tensor-SAE: (1) learns low-entropy spatial atoms and clean color factors; (2) yields linearly predictable intervention effects (R2 ≈ 0.93) enabling controllable color edits; (3) achieves superior reconstruction efficiency per FLOP and per parameter; (4) produces consistently sparse latents; and (5) stabilizes intervention strength during training. We discuss trade-offs, limitations, and the application of Tensor-SAE as a building block for interpretable, compute-efficient generative systems.