Skip to yearly menu bar Skip to main content


Poster

Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

Sina Khajehabdollahi · Roxana Zeraati · Emmanouil Giannakakis · Tim Schäfer · Georg Martius · Anna Levina

Halle B #51

Abstract: Recurrent neural networks (RNNs) in the brain and \emph{in silico} excel at solving tasks with intricate temporal dependencies.Long timescales required for solving such tasks can arise from properties of individual neurons (single-neuron timescale, $\tau$, e.g., membrane time constant in biological neurons) or recurrent interactions among them (network-mediated timescale, $\tau_\textrm{\small{net}}$). However, the contribution of each mechanism for optimally solving memory-dependent tasks remains poorly understood. Here, we train RNNs to solve $N$-parity and $N$-delayed match-to-sample tasks with increasing memory requirements controlled by $N$, by simultaneously optimizing recurrent weights and $\tau$s. We find that RNNs develop longer timescales with increasing $N$, but depending on the learning objective, they use different mechanisms. Two distinct curricula define learning objectives: sequential learning of a single-$N$ (single-head) or simultaneous learning of multiple $N$s (multi-head). Single-head networks increase their $\tau$ with $N$ and can solve large-$N$ tasks, but suffer from catastrophic forgetting. However, multi-head networks, which are explicitly required to hold multiple concurrent memories, keep $\tau$ constant and develop longer timescales through recurrent connectivity. We show that the multi-head curriculum increases training speed and stability to perturbations, and allows generalization to tasks beyond the training set.This curriculum also significantly improves training GRUs and LSTMs for large-$N$ tasks. Our results suggest that adapting timescales to task requirements via recurrent interactions allows learning more complex objectives and improves the RNN's performance.

Chat is not available.