What is it about?

Human behavior has the nature of mutual dependencies, which requires human-robot interactive systems to predict surrounding agents’ trajectories by modeling complex social interactions, avoiding collisions and executing safe path planning. While there exist many trajectory prediction methods, most of them do not incorporate the own motion of the ego agent and only model interactions based on static information. We are inspired by the humans’ theory of mind during trajectory selection and propose a Cross time domain intention-interactive method for conditional Trajectory prediction(CiT). Our proposed CiT conducts joint analysis of behavior intentions over time, and achieves information complementarity and integration across different time domains. The intention in its own time domain can be corrected by the social interaction information from the other time domain to obtain a more precise intention representation. In addition, CiT is designed to closely integrate with robotic motion planning and control modules, capable of generating a set of optional trajectory prediction results for all surrounding agents based on potential motions of the ego agent. Extensive experiments demonstrate that the proposed CiT significantly outperforms the existing methods, achieving state-of-the-art performance in the benchmarks.

Featured Image

Why is it important?

Trajectory prediction of surrounding agents is a crucial component for ensuring safety in autonomous driving systems, as it enables avoiding collisions with human-driven cars. Moreover, predicting trajectories of surrounding agents has extensive applications in human-robot interaction, social robots, drones, and other domains[3, 6, 20, 32]. Humans can navigate through various social scenarios because they have an intrinsic theory of mind, which is the capacity to reason about other human’s actions based on their mental states. Imbuing autonomous systems with such capability could enable more informed decision making and motion planning[10, 29, 37, 39]. However, predicting trajectory of agents in real world is challenging since an agent’s trajectory is not determined by itself but involves complex social interactions with surrounding agents. Therefore, previous works[7, 11, 12, 38, 41] have proposed a series of methods to model such interactions. Despite the remarkable progress have been made, these methods still face three critical problems. First, far less attention is paid to the ego agent’s own motion, which hinders direct application of these models to real-world robotic systems. Being able to predict surrounding agents’ corresponding reactions based on different potential motions of the ego agent is a very crucial capability for downstream tasks such as decision making, motion planning in robotic systems. For example, when the ego agent is faced with multiple trajectories to choose from, it can generate the predicted future trajectories of the surrounding target agents for each candidate trajectory respectively. Then, the trajectory that optimizes the overall system time/efficiency is selected as the final execution trajectory. Second, they do not conduct dynamic modeling of social interactions over time. During the movement of the agent, its intention dynamically changes as it interacts with surrounding agents. Therefore, trajectory prediction models should jointly analyze intentions over time to achieve dynamic modeling of social interactions. Third, different surrounding agents have varying degrees of influence on the target agent whose trajectory we want to predict. Many convolution and social pooling methods extract features of surrounding agents and directly concatenate them without letting the network learn the degree of influence in a prioritized manner. In order to model complex social interactive behaviors more delicately and tightly integrate with robotic system downstream planning and control tasks, we propose the CiT to produce behavior trajectory prediction of all surrounding agents based on ego-agent motion plans. The core of the proposed CiT is to mutually complement and refine intention representations over time through semantic supplementation and feature correction. Figure 1 illustrates the main working mechanism of the proposed model. CiT contains four key designs: First, we introduce the future trajectory of the ego agent. Note that CiT does not require the exact future trajectory, which is actual undetermined during prediction. CiT only conditions a rough trajectory which can be easily obtained by trajectory generator. This allows our proposed model to generate optional predictions based on candidate trajectories proposed by downstream planning and control modules. Second, in the intention graph construction module, by analyzing the past trajectories of the target agent and neighbor agents, we can infer the current intention of the target agent. To preserve spatial information, we map this intention onto a social tensor according to the target agent’s location and refer to it as the "Intention Graph in the Current Time Domain." Furthermore, by incorporating the future trajectory of the ego agent, we can model the potential social interactions between the target agent and the ego agent in the future and predict the future intention of the target agent. Similarly, we map this future intention onto a social tensor based on its location and refer to it as the "Intention Graph in the Future Time Domain." Third, since the intention information from both time domains during the construction of the intention graph is partial and coarse, in the interaction cross domain module, intention information from different time domains interacts with each other. The intention in one time domain proposes a Query to the other time domain and corrects its own intention through the Key and Value from the other time domain. Through joint analysis of intention information over time, features across different agents, spaces, and time domains are fully extracted and fused to obtain a more precise intention representation. Fourth, in the intention influence evaluation module, the network estimates the degree of influence of different intentions on the future trajectory of the target agent, further refining the interaction process.

Perspectives

The main contributions are concluded as follows: • We propose CiT, which comprises four novel designs, including 1) future motion incorporation, captures the interactive aspect in human-robot interaction, 2) intention graphs construction, constructs two types of intention graphs, 3) interaction cross domain, achieves information complementarity between the two intentions, integrating information across different time domains and agents, 4) intention influence evaluation, enables the network to consider the degree of influence from different agents in a prioritized manner. • In robotic systems, multiple candidate trajectories can be generated to evaluate their corresponding performance in the prediction module, the CiT will provide a highly valuable interface for integrating this trajectory prediction model into robotic system. • We conduct experiments on two real-world datasets to evaluate our method. Experimental results show that CiT achieves state-of-the-art performance.

Yuxiang Zhao
Alibaba Group

Read the Original

This page is a summary of: Cross Time Domain Intention Interaction for Conditional Trajectory Prediction, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746027.3754709.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page