Poster
in
Workshop: Lifelong Agents: Learning, Aligning, Evolving Sun, Apr 26, 2026 • 11:00 AM – 12:00 PM PDT

$\textbf{DomusMind}$: A Benchmark for Evaluating Lifelong Smart Home Agents Under Drift

Rong Xu ⋅ Yinxin Wan ⋅ Xiaochan Xue

Project Page [ OpenReview]

Abstract

Smart home agents require continuous operation in non-stationary environments where human preferences and device reliability keep evolving. However, dominant evaluation protocols remain episodic and reset-based, failing to capture the degradation and recovery dynamics essential for long-term deployment. To address this gap, we introduce $\textbf{DomusMind}$, a benchmark for evaluating lifelong agents under two sources of non-stationarity: $\textit{preference drift}$ (persona) and $\textit{tool drift}$ (execution). $\textbf{DomusMind}$ instantiates a persistent interaction loop where agents balance autonomous execution and user burden. By tracking time-resolved metrics across preference, tool, and mixed drift scenarios, our results show that online Theory of Mind (ToM) with uncertainty-gated confirmation provides the most robust adaptation overall. Notably, $\texttt{ORACLE}$ persona access fails to mitigate $\textit{tool drift}$, which identifies execution reliability as a distinct bottleneck. By sweeping a confirmation threshold, $\textbf{DomusMind}$ characterizes a success–annoyance frontier that enables principled selection of operating points for long-horizon alignment.

Chat is not available.