Learning Situated Awareness in the Real World
Professional Abstract
"The paper introduces SAW-Bench (Situated Awareness in the Real World), a novel benchmark designed to evaluate egocentric situated awareness in multimodal foundation models (MFMs). Situated awareness is defined as the ability to relate oneself to the surrounding environment and to reason about possible actions based on that context. Traditional benchmarks have primarily focused on environment-centric spatial relations, which assess relationships among objects within a scene, neglecting the crucial observer-centric relationships that depend on the agent's viewpoint, pose, and motion. This oversight presents a significant gap in the evaluation of models intended to understand human-like perception and interaction with the environment. To address this issue, the authors developed SAW-Bench, which consists of 786 self-recorded videos captured using Ray-Ban Meta (Gen 2) smart glasses, showcasing a variety of indoor and outdoor environments. Accompanying these videos are over 2,071 human-annotated question-answer pairs that are structured to probe a model's observer-centric understanding through six distinct awareness tasks. The comprehensive evaluation conducted reveals a substantial performance gap of 37.66% between human participants and the best-performing MFM, Gemini 3 Flash. This gap underscores the limitations of current models in achieving human-like situational awareness. Further analysis indicates that while these models can leverage partial geometric cues present in egocentric videos, they frequently struggle to infer coherent camera geometry, resulting in systematic errors in spatial reasoning. The authors argue that SAW-Bench serves as a critical benchmark for assessing situated spatial intelligence, emphasizing the need for models to progress beyond mere passive observation to a more profound understanding of physically grounded, observer-centric dynamics. This research not only highlights the deficiencies in existing models but also sets the stage for future advancements in the field of artificial intelligence, particularly in enhancing the situational awareness capabilities of MFMs."