Skip to yearly menu bar Skip to main content


Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

David Hoffmann ⋅ Simon Schrodi ⋅ Jelena Bratulić ⋅ Nadine Behrmann ⋅ Volker Fischer ⋅ Thomas Brox

Abstract

Chat is not available.