Poster
Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks
Sina Khajehabdollahi · Roxana Zeraati · Emmanouil Giannakakis · Tim Schäfer · Georg Martius · Anna Levina
Halle B #51
Abstract:
Recurrent neural networks (RNNs) in the brain and \emph{in silico} excel at solving tasks with intricate temporal dependencies.Long timescales required for solving such tasks can arise from properties of individual neurons (single-neuron timescale, , e.g., membrane time constant in biological neurons) or recurrent interactions among them (network-mediated timescale, ). However, the contribution of each mechanism for optimally solving memory-dependent tasks remains poorly understood. Here, we train RNNs to solve -parity and -delayed match-to-sample tasks with increasing memory requirements controlled by , by simultaneously optimizing recurrent weights and s. We find that RNNs develop longer timescales with increasing , but depending on the learning objective, they use different mechanisms. Two distinct curricula define learning objectives: sequential learning of a single- (single-head) or simultaneous learning of multiple s (multi-head). Single-head networks increase their with and can solve large- tasks, but suffer from catastrophic forgetting. However, multi-head networks, which are explicitly required to hold multiple concurrent memories, keep constant and develop longer timescales through recurrent connectivity. We show that the multi-head curriculum increases training speed and stability to perturbations, and allows generalization to tasks beyond the training set.This curriculum also significantly improves training GRUs and LSTMs for large- tasks. Our results suggest that adapting timescales to task requirements via recurrent interactions allows learning more complex objectives and improves the RNN's performance.
Chat is not available.