In this episode of the Talking Papers Podcast, I hosted Tomas Jakab. We had a great chat about his paper “MagicPony: Learning Articulated 3D Animals in the Wild”, published in CVPR 2023.
The motivation behind the MagicPony methodology stems from the challenge posed by the scarcity of labeled data, particularly when dealing with real-world scenarios involving freely moving articulated 3D animals. In response, the authors propose an innovative solution that addresses this issue. This novel approach takes an ordinary RGB image as input and produces a sophisticated 3D model with detailed shape, texture, and lighting characteristics. The method’s uniqueness lies in its ability to learn from diverse images captured in natural settings, effectively deciphering the inherent differences between them. This enables the system to establish a foundational average shape while accounting for specific deformations that vary from instance to instance. To achieve this, the researchers blend the strengths of two techniques, radiance fields and meshes, which together contribute to the comprehensive representation of the object’s attributes. Additionally, the method employs a strategic viewpoint sampling technique to enhance computational speed. While the current results may not be suitable for practical applications just yet, this endeavor constitutes a substantial advancement in the field, as demonstrated by the tangible improvements showcased both quantitatively and qualitatively.
Tom is a postdoc in the VGG lab at Oxford University, where he recently completed his PhD with Andrea Vedaldi. His research focuses on single-view 3D reconstruction, 2D/3D object representations, and self-supervised learning. We crossed paths at CVPR 2023, introduced by Despoina Paschalidou (of the Neural Parts episode). Upon learning he’s one of the author of MagicPony, I invited him to the podcast. Our conversation was delightful, and I’m excited about his future projects.
AUTHORS
Shangzhe Wu*, Ruining Li*, Tomas Jakab*, Christian Rupprecht, Andrea Vedaldi
ABSTRACT
We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object’s shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.
RELATED PAPERS
📚CMR
LINKS AND RESOURCES
📚 Paper
💻Code
To stay up to date with their latest research, follow on:
👨🏻🎓Personal page
👨🏻🎓Google Scholar
Recorded on July 27th 2023.
CONTACT
If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com
SUBSCRIBE AND FOLLOW
🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com
📧Subscribe to our mailing list: http://eepurl.com/hRznqb
🐦Follow us on Twitter: https://twitter.com/talking_papers
🎥YouTube Channel: