Highlights and Insights from CVPR 2024

Day 1: A Rocky Start and a Warm Welcome

My first day at CVPR started off on a less-than-ideal note. Due to some unexpected delays at the airport, I missed all of the lectures I had planned to attend. By the time I arrived, the only thing left to do was pick up my badge. Despite this rocky start, the day took a turn for the better.

In the evening, the Israelis at Amazon hosted a fantastic networking event. We gathered for a pizza evening where I had the pleasure of meeting some of Israel’s finest researchers. It was a wonderful opportunity to catch up with old friends and make new connections. Among the attendees were two familiar faces from my podcast, Yael Vinker and Itai Lang. Also, I got to first meet in person Omer Shapira that I only knew from the twitterverse (and his work). Our conversations spanned various topics, from cutting-edge research to the future of AI, making for an intellectually stimulating evening. They restored my faith in Israel’s future research.

We wrapped up the night at Elephant and Castle. The company was excellent, filled with lively discussions and camaraderie. The food, however, left much to be desired. Despite the culinary shortcomings, the connections and conversations made the evening memorable.

Day 2: Exploring the Future of 3D Reconstruction and Neural Representations: Highlights

First up was Shubham Tulsiani’s talk on Sparse-view 3D in the wild. The goal here is to virtualize a real object for AR/VR applications. The challenge lies in balancing the ease of capture against the faithfulness of reconstruction. Ideally, we want systems that are easy to capture yet deliver high fidelity. Shubham highlighted the importance of robust sparse-view pose prediction, referencing tools like sparsepose, RelPose, RelPose++, and PoseDiffusion. He suggested reparametrizing this as local prediction, citing “Cameras as rays” as a great resource—something we even discussed on my podcast! Generative priors for guiding 3D inference were another key point, emphasizing analysis by generative synthesis. This was a fantastic start to the day.

Next, Richard Newcombe from Meta spoke about real-time 3D digital twins for XR and Contextual AI. Richard focused on egocentric capture, similar to phone capture but with the goal of live dynamic reconstruction of reality. He introduced Project Aria, which advances wearable machine perception and AI for future smart AR glasses. A standout point was the challenge of motion blur in egocentric video, with only 35% of frames being usable. He discussed moving from offline to online models to build a full digital twin of the world state in seconds. This talk was a deep dive into the technical advancements and challenges in this space, with practical applications like 3D egocentric foundation models trained on the Aria digital twin simulation dataset.

Jonathon Luiten’s presentation on “3D Gaussian Splatting for Dynamic Scenes and SLAM” was another highlight. He defined computer vision as understanding the world using a computer by analyzing sensor observations. The talk contrasted labeling sensor observations with modeling the dynamic world. Jonathon emphasized the need for models that understand the world as it moves, referencing his paper on Dynamic 3D Gaussians. He discussed the challenges of maintaining persistent representations over time and introduced creative applications like Gaussian-eye view and compositional dynamic scenes. This was a compelling exploration of dynamic 3D modeling.

Unfortunately, technical issues prevented me from covering Lingjie Liu’s talk on Generalized 3D Reconstruction. However, I noted some interesting projects like Wonder3D and GECO: Generative Image to 3D within a SECOnd.

Jon Barron’s talk on “Radiance Fields & Generative 3D” wrapped up my attendance in the workshop. He explained NeRFs (Neural Radiance Fields) and their limitations with sparse image data. Jon discussed the continuum from dense image reconstruction to single image generation, culminating in 3D scene generation. He highlighted DreamFusion and ReconFusion, which integrate ideas from reconstruction to generation. His focus on using 2D models for 3D generation was insightful, advocating for the use of diffusion sampling turned into optimization (Score Distillation Sampling, SDS). Jon stressed the importance of quantitative evaluation over vibe-based research, pushing for rigorous measurement of results.

I had a great lunch date with Silvia Sellán that I finally got to meet in person (another amazing former odcast guest). She introduced me to Noam Aigerman who I have been academically following for a while. We had a great conversation, it was amazing how much we had in common.

After lunch, I attended George Kopanas’s talk on recent advances in Gaussian Splatting. He outlined the criteria for a good 3D representation: accuracy, speed, memory efficiency, and practicality. Gaussian splatting ticks all these boxes, achieving comparable PSNR to mipnerf360, running at 100fps, training in under an hour, and rendering on mobile devices. George introduced mip-splatting for alias-free Gaussian splatting and discussed extending 3DGS using hierarchies for large datasets.

Peter Hedman followed with a talk on transcoding for fast view synthesis. He discussed the trade-offs between easy rendering and easy reconstruction, positioning 3DGS as a balanced solution. Peter also covered SMERF (fast volume rendering) and the innovation of binary opacity grids to increase speed.

I had to miss the packed 3DFV workshop, but I managed to catch the INRV workshop later. This session featured a range of talks on implicit neural representations (INRs):

  • Jia-Bin Huang discussed scaling neural fields for large spaces and incorporating temporal coordinates.
  • Srinath Sridhar spoke on implicit representations for the interactive 4D world, emphasizing the challenge of building machines that understand and learn like humans.
  • Namitha Padmanabhan explained the inner workings of INRs through contribution maps, revealing low-level scene attributes.
  • Xiaolong Wang presented on feature fields for manipulations, highlighting the need for semantic understanding and efficiency in dynamic changes.
  • Hyunjik Kim stressed the importance of local representations for downstream tasks, introducing the concept of representing with functions (Functa).

Despite my laptop almost running out of battery halfway through the day, it was a highly informative and inspiring series of workshops. From the discussions on the limitations of current methods to the potential of emerging technologies, Day 2 at CVPR was a deep dive into the future of 3D reconstruction and neural representations. Looking forward to more insights and innovations tomorrow!

During one of the workshops, a deeply troubling incident occurred. A slide was presented that accused my country of genocide in Palestine. This accusation was not only false but also had no place in a scientific conference like CVPR. I was deeply offended by this misrepresentation and shared my concerns on Twitter. The response was overwhelming: while many in the community expressed their support and solidarity, I also received a significant amount of hateful messages. This incident highlighted the importance of maintaining a focus on scientific discourse and the need for sensitivity and accuracy in all presentations.

A Heartwarming Surprise

Despite this unsettling incident, the day ended on a very positive note. A student I have been working with for several years surprised me with a gift: three 3D printed models, each representing a shape from our joint papers during his PhD. He is about to graduate, and this gesture was incredibly touching. We first met when I was still in Australia, and although he had no prior experience with 3D, I quickly recognized his potential. He was searching for his next challenging research idea, so I offered him the opportunity to join the DiGs paper (back when it was still called DiBS) that hit a wall. This moment completely changed his research trajectory as he fell in love with reconstruction, appreciating how it elegantly combines math, geometry, coding, and fun. Seeing the impact I had on this amazing person’s life and career made me very emotional. It feels like such a privilege, and I am humbled to have had the opportunity to work with him. He will be on the job market soon so keep an eye out!

Day 3: From Awards to Embodiment

Opening and Award Ceremony:

Today started with an inspiring opening and award ceremony. The excitement in the air was palpable as the conference’s remarkable statistics and program details were unveiled. Panels were set up, and the awards were announced, celebrating outstanding contributions to the field.It was a moment of pride and recognition for all the hard work put into advancing computer vision and pattern recognition. Congrats to all the awardees!

Oral Session on Vision and Graphics:

Following the award ceremony, the first oral session focused on vision and graphics. The presentations were engaging, showcasing cutting-edge research and innovative approaches in these fields. The session was a testament to the incredible progress being made in computer vision and graphics.

Poster Sessions:

I love poster sessions. I enjoy walking around, waiting for something or someone to catch my attention and then strike a conversation and ask questions to understand the science better. Here are a few posters I visited and enjoyed.

Second Oral Session:

The second oral session of the day kept the momentum going with more insightful presentations. Researchers shared their latest findings, sparking discussions and ideas among attendees. The exchange of knowledge and perspectives was enriching.

Keynote by Josh Bongard: “The Tip and the Iceberg: Deep Learning and Embodiment”:

Next up was the much-anticipated keynote by Josh Bongard. His talk, titled “The Tip and the Iceberg: Deep Learning and Embodiment,” delved into the concept of embodiment in the context of artificial intelligence. Bongard emphasized that while deep learning has revolutionized the field, true intelligence cannot be achieved by neural networks alone. He argued that we need a body to interact with the world, to push and observe the pushback, to understand cause and effect deeply.

The Importance of Embodiment:

Bongard highlighted that the idea of embodiment has been around for a long time but has often been overshadowed by the deep learning revolution. He pointed out that without the ability to cause an effect, it is impossible to measure one. This fundamental aspect of intelligence, he argued, is crucial for the development of truly intelligent systems. Moreover, organisms have complex internal processes beyond neural behavior, which must be considered in AI research.

In the talk he did a cool demo, getting us to run a code notebook and generate our own root simulation. Here is mine:

The talk was very inspiring and felt a little bit like the beginning of an apocalyptic Hollywood movie – biorobots will take over the world! 😉

The day concluded with a fascinating panel discussion.

After the program ended I went to Meta’s networking mixer event, met some old friends and made some new ones. The view was amazing!

Day 4 at CVPR 2024: From 3D Vision to Protein Design

I kicked off the day with the oral session on “3D from a Single View.”

The immense scale of CVPR this year made it quite a challenge to navigate the sessions. The room for the 3D vision talks was packed to the brim, with a queue forming outside, so I had to pivot and ended up in the session on Action and Motion. Sometimes, I wish I could split my consciousness to experience multiple sessions simultaneously.

Despite the change of plans, I was captivated by an outstanding oral presentation (award nominee) from my long time mentor’s (Stephen Gould) group on “Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation.” Their work shed light on innovative methodologies for action segmentation, offering promising avenues for future research.

Next up was the keynote by David Baker, who presented on “De novo Protein Design Using Deep Learning.” This session was an eye-opener on how computer vision is revolutionizing biology and medicine. Baker’s team approaches problems by designing new proteins to address specific challenges in biology, medicine, and chemistry. They then create amino acid sequences that fold into the desired structures, assemble synthetic DNA encoding these proteins, and verify their effectiveness by putting them into cells.

One of the standout points was how AI can now generate useful proteins in seconds. Their toolkit includes RoseTTAFold for accurate protein structure prediction, RFdiffusion for protein design, and proteinMPNN. Over the past five years, deep learning models have surpassed all physical models in this field. Applications of their work range from designing proteins to protect against lethal snake venom toxins to creating potent anti-tumor proteins for cancer treatment. They also use generative methods to create sensors and proteins that bind to specific regions of DNA to control cellular functions. The potential applications are vast, including designing proteins for nanomaterials and artificial photosynthesis.

Later in the day, the TPAMI TC meeting took a serious turn as we discussed ombuds roles and incidents from this and previous CVPRs. Three motions were brought forward: the first addressing reviewers requesting experiments without available code, the second about text-only rebuttals, and the third concerning the proposal to hold ICCV 2029 in Dubai. The third motion sparked intense and emotional reactions, especially from the LGBTQ+ community, who felt excluded by the potential location choice. The tension in the room was palpable.

I ended the day with an unexpected highlight. Although I missed the reception, I thoroughly enjoyed the Snap event, where I engaged in civil discussions about politics and various opinions with fellow scholars. We didn’t always agree, but it was refreshing and educational to hear diverse perspectives, making the event a memorable and enriching experience.

Day 5 at CVPR 2024: My poster session and a great panel

Day 5 at CVPR started on a hectic note for me as I had to skip the first oral session to set up my posters. With two papers to present and both scheduled in the same session, it was a bit of a juggling act. Chamin kindly took the reins on presenting our “small steps and level sets” paper, while I focused on “3DinAction.” Despite the rush, it was rewarding to engage with attendees and discuss our work in detail.
One fun thing that happened, right across from my poster was a poster by Christian Diller that used OUR IKEA ASM dataset!

Later in the day, I attended a keynote by the talented artist Sofia Crespo on “Entanglements, exploring Artificial Biodiversity.” Although I was physically exhausted and unable to take detailed notes, the art she presented was deeply inspiring. Crespo’s exploration of artificial biodiversity through art provided a refreshing perspective on the interplay between technology and nature, leaving a lasting impression on me.

Following the keynote, I listened to a panel discussion on the past, present, and future of CVPR, featuring Cordilia Schmidt, Dima Damen, and Ranjay Krishna. The panelists discussed various topics, including how their opinions have evolved over time and their visions for the future. Dima reflected on how she once viewed language as unnecessary but now sees a promising future in 3D. Cordilia’s perspective shifted from human-in-the-loop systems to fully automatic end-to-end processes. A key piece of advice for newcomers to the field was to view a PhD as a unique opportunity to tackle new problems and focus on meaningful scientific contributions rather than merely generating engineering solutions.

The discussion on improving the review process was particularly engaging. The consensus was that the review process is currently flawed due to its scale, and while making it more open could help, it wouldn’t be a complete solution. The varied perspectives on how to address these challenges—ranging from maintaining traditional practices (Cordilia) to advocating for radical changes (Dima)—made for a lively and refreshing conversation. Personally, I found myself aligning with Dima’s views but appreciated the diversity of opinions presented.

The final poster session of the day was more extensive than I had anticipated. I took the opportunity to walk around, chat with researchers, and learn about their innovative work. It was a great way to close out the academic part of the day, filled with stimulating discussions and new insights.

To cap off the day, I had a delightful dinner with my Aussie colleagues and friends—Chamin, Dylan, and Steve. Before heading to dinner, we made a quick visit to the iconic Seattle bridge Troll, which was a fun and memorable experience. It was the perfect end to a long and fulfilling day at CVPR.

CVPR 2024 – Final professional thoughts

This year, as usual, CVPR was packed with high-quality science alongside fascinating discussions and a rainbow of opinions. Diffusion and generative models dominated much of the conference, while 3D research also had a significant presence, which was both encouraging and intimidating for me as a researcher in the 3D field. I noticed fewer works on traditional vision tasks, perhaps because some of these are now considered “solved,” highlighting the field’s rapid evolution.

Due to my unexpected Twitter “fame,” I attended the PAMI TC meeting for the first time, and I plan to continue participating in the future. It was enlightening to hear other scholars’ opinions and experiences and to have a vote in the community’s direction. I feel we all have a moral obligation to be part of this process.

One question remains open – how will we handle the increase in scale?

I would like to extend my heartfelt thanks to the organizers of @CVPR for their tireless efforts in putting together such a remarkable conference. The seamless execution of the event, from the diverse sessions to the engaging keynotes and panels, made this year’s CVPR truly exceptional. Your dedication to creating an inclusive and intellectually stimulating environment is greatly appreciated. The opportunity to connect with fellow researchers, share our work, and engage in meaningful discussions has been invaluable. Thank you for your hard work and commitment to advancing the our field.

CVPR 2024 – Final personal thoughts

On a personal note, this is the first CVPR where I truly felt like I belong to this incredible community. In previous years, I felt like a “visitor,” a former mechanical engineer navigating this vast conference. This year, I had friends I was excited to see, made many new ones, and engaged with people familiar with my work. It was thrilling when a student I didn’t know mentioned MFGI.

For the first time, I felt like I was contributing not just as a presenter of my papers but as an active community member, advancing knowledge and training the next generation. Chamin’s gift and the support I received on social media reaffirmed that my efforts are appreciated, motivating me to continue making science accessible to ALL.

Despite the depressing and troubling reality in Israel, this year’s CVPR reminded me that the future is bright and that we have an important role in shaping it. I am more determined than ever to contribute to a better future through my work.