NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

In this episode of the Talking Papers Podcast, I hosted Chenfeng Xu. We had a great chat about his paper “NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection”, published in ICCV 2023.

In recent times, NeRF has gained widespread prominence, and the field of 3D detection has encountered well-recognized challenges. The principal contribution of this study lies in its ability to address the detection task while simultaneously training a NeRF model and enabling it to generalize to previously unobserved scenes. Although the computer vision community has been actively addressing various tasks related to images and point clouds for an extended period, it is particularly invigorating to witness the application of NeRF representation in tackling this specific challenge.

Chenfeng is currently a Ph.D. candidate at UC Berkeley, collaborating with Prof. Masayoshi Tomizuka and Prof. Kurt Keutzer. His affiliations include Berkeley DeepDrive (BDD) and Berkeley AI Research (BAIR), along with the MSC lab and PALLAS. His research endeavors revolve around enhancing computational and data efficiency in machine perception, with a primary focus on temporal-3D scenes and their downstream applications. He brings together traditionally separate approaches from geometric computing and deep learning to establish both theoretical frameworks and practical algorithms for temporal-3D representations. His work spans a wide range of applications, including autonomous driving, robotics, AR/VR, and consistently demonstrates remarkable efficiency through extensive experimentation. I am eagerly looking forward to see his upcoming research papers.


Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka


 We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object’s shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To NeRF-Det is a novel method for 3D detection with posed RGB images as input. Our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. We subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without per-scene optimization.





📚 Paper

💻Project page


To stay up to date with their latest research, follow on:

👨🏻‍🎓Personal page


👨🏻‍🎓Google Scholar

Recorded on August 8th 2023.


If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email:


🎧Subscribe on your favourite podcast app:

📧Subscribe to our mailing list:

🐦Follow us on Twitter:

🎥YouTube Channel: