Research

AgentIR: Reasoning-Aware Retrival for Deep Research Agents

arXiv•March 4, 2026 ()•Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, Akari Asai, Victor Zhong

Professional Abstract

"The emergence of Deep Research agents as primary consumers of retrieval systems has necessitated a reevaluation of how these systems interpret user intent and context. Traditional retrieval systems often overlook the nuanced reasoning that precedes a query, which is critical for understanding user intent. This paper introduces a novel paradigm called Reasoning-Aware Retrieval, which integrates the reasoning process of Deep Research agents into the retrieval mechanism. By embedding the agent's reasoning alongside its query, the system can leverage additional contextual information that enhances retrieval accuracy. Furthermore, the authors present DR-Synth, a data synthesis method designed to create training data specifically for Deep Research retrievers from existing question-answering datasets. The effectiveness of these innovations is demonstrated through the development of AgentIR-4B, an embedding model that significantly outperforms conventional models on the BrowseComp-Plus benchmark. AgentIR-4B achieved an impressive 68% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50% accuracy from larger conventional embedding models and a mere 37% from the traditional BM25 algorithm. The results underscore the importance of reasoning in retrieval tasks and suggest that integrating reasoning traces can lead to substantial improvements in performance. The code and data for this research are publicly available, promoting further exploration and development in this area."

Technical Insights

1Deep Research agents utilize explicit natural language reasoning before search calls, providing rich context that traditional retrieval systems neglect.

2The proposed Reasoning-Aware Retrieval paradigm embeds the reasoning trace of agents alongside their queries, enhancing retrieval effectiveness.

3DR-Synth is introduced as a method for generating training data for Deep Research retrievers from established QA datasets, facilitating model training.

4AgentIR-4B, the resulting embedding model, demonstrates significant performance improvements, achieving 68% accuracy on the BrowseComp-Plus benchmark.

5In contrast, conventional embedding models of double the size only reached 50% accuracy, highlighting the efficiency of the new approach.

6The traditional BM25 algorithm performed even worse, with an accuracy of just 37%, illustrating the limitations of older retrieval methods.

7The research emphasizes the critical role of reasoning in retrieval tasks, suggesting that understanding user intent can lead to better search outcomes.

8The findings advocate for a shift in how retrieval systems are designed, urging the integration of reasoning processes into the core of retrieval algorithms.

9The availability of code and data at https://texttron.github.io/AgentIR/ encourages further research and development in Reasoning-Aware Retrieval systems.