Affective Multimodal Agents with Proactive Knowledge Grounding for Aligned Marketing Dialogue
Abstract
Despite recent progress in large language models (LLMs), most dialogue systems remain reactive and perform inadequately in emotionally nuanced, goal-oriented domains such as marketing conversations. We present AffectMind, a multimodal affective dialogue agent that enables proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind integrates three components: a Proactive Knowledge Grounding Network that continuously updates factual and affective context from textual, visual, and prosodic signals; an Emotion-Intent Alignment Model that jointly infers user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop that optimizes emotional coherence and long-term engagement via reinforcement learning from user feedback. Evaluations on two newly curated multimodal marketing dialogue benchmarks, MM-ConvMarket and AffectPromo, demonstrate that AffectMind significantly outperforms strong LLM-based baselines, achieving improvements of 26% in emotional consistency, 19% in persuasive success rate, and 23% in sustained user engagement. These results underscore emotion-grounded proactivity as a critical capability for next-generation commercial dialogue agents.