DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

In this episode of the Talking Papers Podcast, I hosted Cristian Rodriguez-Opazo to chat about his paper “DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video”, published in WACV 2021. Cristian is currently a Research Fellow at the Australian Institute of Machine Learning (AIML) and before that (where I first met him) a PhD student at the Australian National University (ANU). Cristian is a good friend and an amazing researcher. His perspective on video processing is so unique and grounded in hands-on experience. Recording this first episode with him was a pleasure. DORi takes on the task of temporal moment localization in untrimmed videos using a natural language query. The major player here is the Spatio-temporal graph.

AUTHORS

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould

ABSTRACT

This paper studies the task of temporal moment localization in a long untrimmed video using natural language query. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial subgraph that contextualized the scene representation using detected objects and human features. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCook II as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach

📚”Proposal free temporal moment localization

📚”Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

CODE: 💻https://github.com/crodriguezo/DORi

PAPER Link: DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

DoRi’s WACV talk on YouTube

This episode was recorded on March, 26th 2021.

CONTACT

If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com


SUBSCRIBE AND FOLLOW

🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

📧Subscribe to our mailing list: http://eepurl.com/hRznqb

🐦Follow us on Twitter: https://twitter.com/talking_papers

🎥YouTube Channel: https://bit.ly/3eQOgwP