Research

$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

arXiv•March 4, 2026 ()•Quan Shi, Alexandra Zytek, Pedram Razavi, Karthik Narasimhan, Victor Barres

Professional Abstract

"The paper introduces $τ$-Knowledge, an innovative framework designed to evaluate conversational agents in complex, knowledge-intensive environments, particularly in the fintech sector. As conversational agents become more prevalent in customer support roles, their effectiveness hinges on their ability to retrieve and utilize domain-specific knowledge from extensive, unstructured data sources. Traditional benchmarks have typically assessed retrieval capabilities or tool usage in isolation, failing to capture the intricate interplay between these components in real-world applications. The authors highlight this gap and propose $τ$-Knowledge as a solution, extending the existing $τ$-Bench framework to facilitate a more comprehensive evaluation of agent performance in scenarios requiring both knowledge retrieval and tool application. The study specifically focuses on the $τ$-Banking domain, which simulates realistic customer support workflows in the financial technology sector. In this context, agents must navigate approximately 700 interconnected knowledge documents while executing tool-mediated account updates. This complexity presents significant challenges, as agents are required to not only retrieve relevant information but also to apply it accurately in compliance with internal policies. The results of the evaluation reveal that even state-of-the-art models, equipped with advanced reasoning capabilities, achieve only around 25.5% pass rates in this environment. Moreover, the reliability of these models deteriorates sharply with repeated trials, indicating that they struggle to consistently retrieve the correct documents from the densely interlinked knowledge base and to reason effectively over the intricate internal policies governing customer interactions. The significance of this research lies in its provision of a realistic testbed for the development of conversational agents that can effectively integrate unstructured knowledge in human-facing deployments. By addressing the limitations of existing evaluation frameworks, $τ$-Knowledge aims to foster advancements in the design and implementation of more capable and reliable conversational agents, ultimately enhancing customer support experiences in knowledge-intensive domains."

Technical Insights

1Introduction of $τ$-Knowledge, an evaluation framework for conversational agents in knowledge-intensive environments.

2Focus on $τ$-Banking, a domain simulating fintech customer support workflows with approximately 700 interconnected knowledge documents.

3Critique of existing benchmarks that evaluate retrieval and tool use independently, highlighting the need for integrated assessment.

4Demonstration that even advanced models achieve only ~25.5% pass rates in realistic testing scenarios.

5Observation of significant reliability degradation in agent performance over repeated trials.

6Identification of challenges faced by agents in retrieving correct documents from complex knowledge bases.

7Highlighting the difficulty agents encounter in reasoning accurately over intricate internal policies.

8Emphasis on the importance of integrating unstructured knowledge in human-facing deployments for conversational agents.

9Potential implications for the future development of more capable and reliable conversational agents in customer support.