BEYOND VECTOR SEARCH: HALLUCINATION-FREE FINANCIAL REASONING WITH CHUNK-CENTRIC KNOWLEDGE GRAPHS
Abstract
As LLMs become more widely adopted, more companies and individuals want to integrate the systems with their own local documents; this drove the adoption of vector-based RAG. This worked well for many documents, but when documents become semantically similar (e.g., Apple 10K vs Microsoft 10K, or Fed districts each reporting on "labor markets" using similar language), we see the limitation of relying purely on vector search -- the language and structure are nearly identical, differing only in the subject. From a business side, vector-based RAG also lacks traceability in why certain chunks were chosen. Many approaches have built upon RAG to try and fix these issues, but they still struggle to provide a near-zero hallucination guarantee. To tackle this issue for financial information, we developed a chunk-centric Knowledge Graph to ground RAG structurally, helping to make searching more direct and traceable. Our key insight is that a graph generation method tuned for financial information provides a structural guarantee against cross-entity contamination—ensuring the information given to the LLM is relevant to the question, with a retrieval path that humans can follow and audit (entity to relationship to chunk), unlike vector search where "semantically similar" is the only explanation. We utilize a 3-stage pipeline (Extraction, Resolution, and Assembly), focusing on finding entities and connections between them to help make lookup more direct. This entity extraction and deduplication system was critical to our approach, combining embedding similarity search with LLM verification to deduplicate entities, paired with a constrained topic vocabulary tuned for financial language. To evaluate, we developed BeigeBench (75 questions on the October 2025 Beige Book) which included questions that needed multi-hop reasoning and comparisons between districts and sectors. Our system achieved 100\% lenient accuracy (no incorrect answers) and 90.7\% strict accuracy, compared to 85.3\%/65.3\% for Deep RAG and 84.0\%/72.0\% for Simple RAG. Our Knowledge Graph System produced zero hallucinations; every partial answer was due to retrieval or synthesis gaps which could be further tuned.