Poster
in
Workshop: The 2nd Workshop on Advances in Financial AI Workshop: Towards Agentic and Responsible Systems

Large-Scale Chatbot Validation Through Customer Digital Twin Simulations

Cristovão Iglesias de Oliveira ⋅ Devesh Batra ⋅ Alankar Atreya ⋅ Stefan Sylvius Wagner ⋅ Robert Hankache ⋅ Patrick Sinclair ⋅ Giulio Pelosio ⋅ Michael McMillan ⋅ Greig Cowan ⋅ Raad Khraishi

Project Page [ OpenReview]

Abstract

LLM-based chatbots are transforming customer service in regulated domains such as banking, but scalable and cost-effective validation remains a critical barrier to safe deployment. We present a two-part contribution for large-scale chatbot validation. First, we introduce a methodology for creating high-fidelity synthetic customer agents (SCAs) as digital twins, grounded in real transactional and conversational data, that enables automatic generation and behavioral conditioning to simulate diverse customer profiles and interaction styles. Evaluation demonstrates that SCAs achieve high semantic alignment with real customers, low hallucination rates, and successful personality trait reproduction with controllable interventions. Second, we develop an SCA-based validation framework combining automated LLM-as-a-Judge evaluation, human expert testing, and adversarial probing. Scenario-based validation across emotional states, demographic groups, and linguistic factors confirms robust performance. Our approach was used to validate a customer facing chatbot at a leading UK bank, providing financial institutions with a scalable pathway toward regulatory compliance.

Chat is not available.