SafeFinAgent: Guardrail-Augmented Multi-Agent Framework for Responsible Financial Decision-Making
Abstract
Multi-agent LLM systems show promise for financial tasks such as trading, portfolio management, and risk assessment, yet they lack systematic mechanisms to prevent hallucinated financial claims, regulatory non-compliance, and cascading errors across agent interactions. We present SafeFinAgent, a framework that integrates three components into multi-agent financial architectures: (1) inter-agent guardrails that enforce compliance constraints at the communication layer rather than only at input/output boundaries, (2) a financial hallucination detector trained on a taxonomy of domain-specific errors including fabricated metrics, phantom regulations, and stale data citations, and (3) a risk constraint propagation mechanism that tracks uncertainty and exposure limits across agent handoffs. We introduce FinSafe-Bench, a benchmark spanning 1,847 test scenarios across trading, compliance, and advisory tasks with both performance and safety metrics. Experiments with five backbone LLMs demonstrate that SafeFinAgent reduces financial hallucinations by 67.3% and compliance violations by 71.8% while maintaining 94.2% of unconstrained task performance. Our results reveal that naive guardrailing degrades performance by 23–41%, whereas our inter-agent approach achieves a superior safety-performance Pareto frontier. We evaluate on four established financial task categories; generalization to real-time production environments requires further validation. Code and benchmark will be released upon publication.