AgentIR: Reasoning-Aware Retrival for Deep Research Agents
Professional Abstract
"The emergence of Deep Research agents as primary consumers of retrieval systems has necessitated a reevaluation of how these systems interpret user intent and context. Traditional retrieval systems often overlook the nuanced reasoning that precedes a query, which is critical for understanding user intent. This paper introduces a novel paradigm called Reasoning-Aware Retrieval, which integrates the reasoning process of Deep Research agents into the retrieval mechanism. By embedding the agent's reasoning alongside its query, the system can leverage additional contextual information that enhances retrieval accuracy. Furthermore, the authors present DR-Synth, a data synthesis method designed to create training data specifically for Deep Research retrievers from existing question-answering datasets. The effectiveness of these innovations is demonstrated through the development of AgentIR-4B, an embedding model that significantly outperforms conventional models on the BrowseComp-Plus benchmark. AgentIR-4B achieved an impressive 68% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50% accuracy from larger conventional embedding models and a mere 37% from the traditional BM25 algorithm. The results underscore the importance of reasoning in retrieval tasks and suggest that integrating reasoning traces can lead to substantial improvements in performance. The code and data for this research are publicly available, promoting further exploration and development in this area."