Novelty-Gated Experience Sharing for Multi-Agent Reinforcement Learning
Manish Kota ⋅ Thomas Fan ⋅ Harshita Poojary ⋅ Nolawi Teklehaimanot ⋅ Aishwarya Balwani
Abstract
Decentralized multi-agent reinforcement learning (MARL) can have accelerated learning when agents selectively share informative experiences. To that end, current approaches prioritize high temporal-difference (TD) error as a proxy for informativeness, following the intuition that "surprising" or previously unseen transitions carry the most learning signal. However, we identify a familiarity paradox: in non-stationary multi-agent settings, high TD-error can persist in frequently visited states due to co-adapting agents' policy changes, conflating epistemic uncertainty with aleatoric noise. To test the practical impact of this phenomenon, we propose Novelty-Gated Experience Sharing (NGES), a dual-gate mechanism that shares transitions only when they are both surprising (high TD-error) and novel (low state visitation count). Hash resolution ablation reveals that up to 30% of high TD-error transitions selected for sharing are redundant, and retroactive analysis confirms that blocked experiences exhibit 1.5$\times$ higher TD-error than shared ones - providing direct evidence for the paradox. However, filtering these transitions yields comparable rather than improved performance relative to TD-error-only sharing, and introduces higher seed-to-seed variance, suggesting that hard novelty filtering can occasionally suppress coordination-critical transitions. Consequently, we characterize NGES as a diagnostic probe for when TD-error prioritization over-selects familiar states, and show that the paradox's practical impact is domain-dependent.
Chat is not available.
Successful Page Load