3DInAction: Understanding Human Actions in 3D Point Clouds

In the latest episode of Talking Papers Podcast, we had the incredible opportunity to have an AI avatar host Yizhak (Itzik) Ben-Shabat (our usual host), a research fellow at the Technion and visiting the Australian National University and a recent Marie-Sklodowska Curie Fellow. We delved into his pioneering paper titled “3DInAction: Understanding Human Actions in 3D Point Clouds,” which was published in CVPR 2024. The paper presents a novel method for recognizing human actions within 3D point clouds, addressing the previously under-explored challenges inherent in this data modality, such as lack of structure, permutation invariance, and varying numbers of points. Itzik’s work brings forward the 3DInAction pipeline, featuring t-patches and a hierarchical architecture for learning informative spatio-temporal representations, achieving marked improvements on datasets like DFAUST and IKEA ASM.

One of the main contributions highlighted in Itzik’s paper is the concept of t-patches, which move through time and serve as fundamental building blocks for understanding human actions in 3D spaces. Since embarking on this journey, Itzik faced the challenge of a largely uncharted territory—very few studies attempted 3D action recognition using point cloud data. As we see a growing interest in this area, his work stands out not only for its novelty but also for setting a new benchmark, pushing the frontier further. The 3DInAction pipeline, with its hierarchical and spatio-temporal learning architecture, exemplifies the innovative spirit driving the field of computer vision and pattern recognition forward.

This episode also carries a personal connection. The title of Itzik’s paper, “3DInAction,” is the same as the title of his fellowship, signifying a rewarding conclusion to three years of intensive and inspiring research. This time, our conversation was animated by an AI avatar from Synthesia, bringing a unique flavor to the episode—an idea Itzik was keen to explore just for the sheer excitement of it! Reflecting on our chat, it feels like the end of an era but also the beginning of new prospects. Itzik’s dedication and creativity have always stood out to me, and I can’t wait to see where his research journey will take him next. Tune in for an intellectually stimulating discussion and dive deeper into the exciting world of 3D point cloud action recognition!

AUTHORS


Yizhak Ben-Shabat, Oren Shrout, Stephen Gould

ABSTRACT


We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored. This is mostly due to the inherent limitation of the point cloud data modality—lack of structure, permutation invariance, and varying number of points—which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM.

RELATED WORKS

📚PointNet

📚PointNet++

📚PSTNet

LINKS AND RESOURCES

📚Preprint

💻Project page

💻Code

To stay up to date with his latest research, follow on:

👨🏻‍🎓Personal website

👨🏻‍🎓Google scholar

🐦Twitter

👨🏻‍🎓LinkedIn

This episode was recorded on June 2nd 2024

CONTACT


If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com

SUBSCRIBE AND FOLLOW


🎧Subscribe on your favourite podcast app

📧Subscribe to our mailing list

🐦Follow us on Twitter

🎥Subscribe to our YouTube channel