Nestled in Waikoloa, Hawaii, the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024 conference brought together the best minds in computer vision for a week of insightful discussions and groundbreaking research. The organizers’ choice to kick off each day after 1 PM allowed attendees to savor the stunning Hawaiian beaches and balmy weather, a refreshing departure from traditional conference schedules. This relaxed atmosphere set the stage for engaging conversations and meaningful interactions, fostering an environment conducive to forging connections and nurturing collaborations.

Day 1: Stats, Awards, Orals, Keynote and Posters.

The opening day of WACV 2024 conference marked an impressive beginning, echoing the exponential growth and diverse participation witnessed over the years. The morning commenced with insightful statistics: an astounding 2042 submissions, resulting in a meticulous selection process that bore witness to 91 papers accepted in the initial round (11%) and further 282 in the subsequent round, culminating in an overall acceptance rate of 41%. Interestingly, a mere 3% were deemed worthy of oral presentations, highlighting the competitive nature of this forum.

The landscape of accepted papers revealed a healthy amalgamation of academia and industry, showcasing a promising blend of perspectives and contributions. Of personal interest, the realm of 3D vision held a significant position, ranking third in the algorithm track, indicative of its burgeoning importance in contemporary research.

The acknowledgments extended to the 190 ACs (Area Chairs) and the dedicated efforts of the 1841 program committee members underscored the collaborative spirit and collective endeavor propelling this conference forward.

The ceremony unfolded with well-deserved accolades: ParticleNeRF earning an Honorable Mention, while standout contributions such as “Conditional Velocity Score Estimation for Image Restoration” and “WildlifeDatasets” claimed the titles of Best Paper in the Algorithms and Applications Tracks, respectively. Notably, the Best Student Paper award was bestowed upon Wino Vidi Vici, a testament to the emerging talent within the field.

Following the awards, the audience was treated to a captivating oral session, leading up to the much-anticipated keynote address by Dima Damen. Her presentation, “Opportunities in Egocentric Video Understanding,” delved into the realm of egocentric cameras and the profound implications these wearables hold for the future. Damen provocatively questioned the traditional paradigms of data collection, emphasizing the divergence between starting with data versus labels and its pivotal role in research expansion.

She elaborated on the challenges posed by unbalanced and harder-to-label data encountered when commencing with raw data, juxtaposed with the artificially balanced yet expandable nature of labeled datasets. Damen’s exploration of the Ego4D dataset underscored the intricacies of cross-scenario generalization, shedding light on the diversity of actions across varied cultural and geographical contexts.

Her thought-provoking inquiries into audio-visual correlations, the nuances of object transformation, and the subtleties of grip analysis within the ego domain resonated deeply with the audience. Additionally, Damen’s insights into reconstructing scenes and envisioning the future of egocentric vision through comparative analysis painted a compelling narrative of the field’s trajectory.

The day culminated in an engaging oral session, dinner, and a vibrant poster session, setting a dynamic tone for the days that followed.

Day 2: Keynote, Orals, Panel, and the IKEA Ego 3D Dataset

Day 2 of WACV 2024 commenced with a captivating keynote delivered by Lihi Zelnik-Manor, exploring the transformative realm of digitizing touch and haptics. Addressing touch as the fifth sense, the keynote highlighted the complexities of haptic interaction, merging kinesthetic feedback from muscles and joints with tactile feedback from the skin—sensations like pressure and friction.

Lihi emphasized the significance of rendering touch in today’s technology landscape. Existing devices like microactuators or screens by TANVAS and ULTRALEAP provide partial tactile sensations, offering vibrations, force feedback, friction, or coarse pressure. However, they lack the nuanced textures essential for a realistic touch experience.

Discussing the mechanisms behind touch perception, Zelnik-Manor detailed the six types of mechanoreceptors embedded in our skin, particularly dense in our hands, essential for discerning object properties like softness or hardness.

The keynote underscored the importance of digitizing touch, akin to the strides made in digitizing vision and audio. Highlighting applications in various fields like medicine, art, and online shopping, she illuminated the vast potential awaiting realistic haptic technology.

She outlined the challenges in rendering touch, comparing the pipeline to that of vision technology, and presented ongoing efforts by various companies like Dextrarobotics, sensorglove, manus, haptx, and meta, striving to develop haptic products.

The keynote culminated with a discussion on a tactile simulation device capable of simulating friction, vibration, and pressure. The device, evaluated against different surfaces through 3D prints, showcased promising advancements, reducing recognition accuracy to 86% compared to simply touching the objects and doubling the human inference time.

Following this enlightening keynote, an engaging panel discussion ensued, delving into the nuances of innovation in computer vision and ethical considerations. Panelists deliberated on the academia-industry divide, ethical implications of AI and large language models, and the quest for innovation vis-a-vis ethical responsibilities.

The day further included an oral session, dinner, and a stimulating poster session, where diverse works were presented, including the introduction of our IKEA Ego 3D dataset, offering ego-viewpoint point clouds for enhanced action understanding.

At the end of the second day of the conference, an unsettling incident occurred, shaking the atmosphere of camaraderie. As an initiative to foster networking among Israeli attendees, I created a WhatsApp group, aiming to facilitate connections and social interaction. Regrettably, the group was breached by an individual vehemently opposed to Israel, who flooded it with disturbing and offensive anti-Israeli content. Swiftly, I removed this person from the group to maintain a safe and respectful environment. However, this action incited a series of personal attacks directed at me through private messages, culminating in a distressing and targeted encounter. This unexpected and unwarranted assault, though virtual, left an unsettling impact and served as a stark reminder of the challenges faced beyond the confines of academic discourse.

Day 3: Exploring Nature’s Beauty and Commemorating Insights in Computer Vision History

Day 3 commenced (for me) with an excursion to Akaka Waterfall, offering breathtaking views and a memorable stop at an overlook during the scenic 1.5-hour drive from the venue. For an exceptional dining experience, the 3 Mountain BBQ came highly recommended, providing exquisite food at a fair price, further enhancing the day’s experience.

Upon return, the schedule resumed with the PAMI technical committee meeting and an invigorating oral session, setting the stage for the day’s highlights. Among these was the much-anticipated fireside chat featuring Gerard Medioni, offering a historical perspective on the community’s growth since the 1970s. Interviewed by Michael Black, Medioni shared milestone moments from his career and offered insights into the evolution of computer vision. His reflections on the struggles of obtaining images on computers in the 1980s, compared to contemporary practices, underscored the remarkable advancements in the field. Medioni’s poignant remark, “In your career, there is no A/B testing; you have to make a choice,” resonated deeply, encapsulating the pivotal decisions one faces, such as his transition from academia to industry research roles.

The day concluded with a spectacular luau, celebrating native culture through songs, dances, and mesmerizing fire theatrics, adding a vibrant cultural touch to the event.

Marking the culmination of the main conference, Day 3 paved the way for two ensuing days dedicated to workshops and tutorials, ensuring an extended platform for knowledge exchange and collaborative learning.

Oh, and most importantly, on this day I got to finally meet in person the twitter legend and Program chair Kosta Derpanis.

Day 4: Tutorials and Workshops – In-Depth Insights into Robustness at Inference and Diverse Applications of 3D Geometry Generation

The first day of tutorials and workshops unveiled a wealth of knowledge and diverse perspectives. I began with an insightful tutorial led by Mohit Prabhushankar, delving into “Robustness at Inference: Towards Explainability, Uncertainty, and Intervenability” Prabhushankar elucidated methodologies vital for fostering robustness in neural networks during inference. Highlighting the significance of grasping novel data at inference, the tutorial explored the essence of robustness, where correct and confident predictions are paramount in handling unforeseen data instances. Gradients emerged as key elements providing valuable information, aiding in explanations, uncertainty quantification, and contrastive class understandings.

Exploring uncertainty at inference, the tutorial presented ensembles as a means to gauge uncertainty through variances or entropy between network models. Strategies leveraging gradients were discussed for estimating uncertainty, distinguishing anomalies, and measuring distances between learned representations and novel data. The concept of introspective learning emerged, illustrating a two-stage inference approach melding visual sensing and reflection, offering logical answers.

The tutorial concluded by affirming the multifaceted utility of gradients in enhancing robustness amidst distribution shifts, traversing unknown manifolds, and aiding various image understanding applications. Valuable resources were also shared for further exploration.

Subsequently, the workshop on “3D Geometry Generation for Scientific Computing” provided an unexpected yet enriching immersion into diverse scientific domains harnessing tools developed within the vision community. Presentations showcased how computer vision tools intersected with forest mapping, semiconductor device reconstruction, geostatistical imaging of subglacial environments, and controlled illumination for object perception and manipulation.

Each presentation offered unique insights: from forest recovery mapping leveraging drones and 3D data, to reconstructing semiconductor device surfaces from scanning electron microscope images using simulated geometries, and geostatistical imaging improving ice sheet investigations. The amalgamation of CV expertise with diverse scientific disciplines highlighted immense collaboration potential.

The workshop concluded with an invigorating open discussion, emphasizing the opportunity to build bridges between these distinct communities. This convergence laid the foundation for potential collaborations, exemplifying the synergistic potential between computer vision and various scientific realms.

Oh, and the snacks at the coffee breaks today were exactly my cup of tea (pub intended).

At the end of this day, I had dinner with my mentor in research and life Steve Gould. I can’t recommend the restaurant (Tropics), but boy am I lucky to have Steve in my life. I had a very difficult year, and Steve has been a beacon of light.
I will stop publicly embarrassing him now, but I mean every word 😉

Day 5: Exploring Cutting-Edge Topics in Generative AI and Anomaly Detection

The final day of workshops and tutorials offered a more relaxed atmosphere, allowing for deep dives into emerging areas of Generative AI and Anomaly Detection. The day commenced with an engaging tutorial on “Rich Media with Generative AI“. Topics covered included discussions on Data Ownership in Generative Models, Diffusion Models for 3D Asset Generation, Point-Dragging Manipulation, Image Morphing, and GPT, emphasizing efficiency in Deep Learning Acceleration.

Transitioning to the “Workshop on Automated Spatial and Temporal Anomaly Detection” I had the opportunity to witness Ori Nitzan present his KNNN work. Ori’s fervor for this work, coupled with his intellect, made for intriguing conversations between us in previous months. It was inspiring to witness his project come to fruition, especially given his commitment, even amid external challenges.

Later, I delved into the tutorial on “Reliability in Generative Models” which explored crucial aspects of image generative models and their associated reliability concerns. Discussions ranged from the state-of-the-art image generative models to mitigating training data memorization in diffusion models. Techniques incorporating fingerprints into model weights to trace malicious content origins and methods for assessing inherent biases and potential failure modes within image generative models were also scrutinized.

This day encapsulated a diverse array of insightful sessions, providing profound insights into the latest advancements and critical considerations within Generative AI and Anomaly Detection. It was a fitting conclusion to the conference, leaving a trail of stimulating ideas and fostering an atmosphere of innovation and knowledge exchange.

WACV 2024 Summary

As we bid farewell to WACV 2024, heartfelt gratitude goes to the organizers for orchestrating an exceptional conference that blended intellectual pursuit with appreciation for the beauty of Hawaii. The decision to embrace a later start time proved instrumental, affording participants the opportunity to immerse themselves in the breathtaking surroundings. The intimate setting facilitated genuine exchanges and deep conversations that are often missed in larger conferences. As we carry forward the knowledge and connections forged here, we look forward to the potential collaborations and innovations that this enriching experience has sparked.

Looking ahead, WACV 2025 will convene in Tucson, Arizona, offering another opportunity for the community to engage in the ever-evolving landscape of computer vision.

* Some images were taken from the official WACV twitter account and Prof. Kosta Derpanis twitter account.