Abstract: We introduce the 2-simplicial Transformer, an extension of the Transformer which includes a form of higher-dimensional attention generalising the dot-product attention, and uses this attention to update entity representations with tensor products of value vectors. We show that this architecture is a useful inductive bias for logical reasoning in the context of deep reinforcement learning.

Similar Papers

On Identifiability in Transformers
Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer,
Lite Transformer with Long-Short Range Attention
Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han,
Monotonic Multihead Attention
Xutai Ma, Juan Miguel Pino, James Cross, Liezl Puzon, Jiatao Gu,