Poster
in
Workshop: Self-Improving Foundation Models Without Human Supervision
Preference Tree Optimization: Enhancing Goal-Oriented Dialogue with Look-Ahead Simulations
Lior Baruch · Moshe Butman · Kfir Bar · Doron Friedman
Keywords: [ Direct Preference Optimization (DPO) ] [ Look-Ahead Simulation ] [ Synthetic Data Generation ] [ Goal-Oriented Dialogue ] [ Conversational AI ] [ Multi-Turn Dialogue ] [ Preference-Based Learning ] [ Self-Improving AI ] [ Reinforcement Learning (RL) ] [ Counseling AI ] [ Motivational Interviewing (MI) ] [ Interactive AI Training ] [ Dialogue Systems ] [ Decision-Making in AI ] [ Preference Tree Optimization (PTO) ] [ Oracle Evaluation ] [ Human-AI Interaction ] [ Language Models (LLMs) ] [ Virtual Patients ]
Developing dialogue systems capable of engaging in multi-turn, goal-oriented conversations remains a significant challenge, especially in specialized domains with limited data. This research proposes a novel framework called \textit{Preference Tree Optimization (PTO)}, designed to iteratively improve agent models in such dialogue systems, by generating preference data using a method called Preference Tree with Look-Ahead. Focusing on Motivational Interviewing (MI)—a counseling technique aimed at facilitating behavioral change—we leverage virtual patients and an oracle evaluator to simulate conversations and generate rich preference datasets. By combining this method with Direct Preference Optimization (DPO), we aim to enhance the agent's decision-making capabilities over iterative training cycles. The proposed framework addresses data scarcity and advances the development of more nuanced and effective dialogue systems in goal-oriented domains.Experimental evaluations demonstrate that the PTO framework enhances dialogue agents' performance in goal-oriented conversations within the domain of Motivational Interviewing (MI). Models trained with PTO consistently outperformed the baseline in key metrics such as session satisfaction and working alliance. Additionally, incorporating look-ahead simulations led to improved long-term planning and more effective conversational strategies, with deeper look-ahead configurations yielding the most stable and high-scoring results.