CLIPasso: Semantically-Aware Object Sketching

In this episode of the Talking Papers Podcast, I hosted Yael Vinker. We had a great chat about her paper “CLIPasso: Semantically-Aware Object Sketching”, SIGGRAPH 2022 best paper award winner.

In this paper, they convert images into sketches with different levels of abstraction. They avoid the need for sketch datasets by using the well-known CLIP model to distil the semantic concepts from sketches and images. There is no network training here, just optimizing the control points of Bezier curves to model the sketch strokes (initialized by a saliency map). How is this differentiable? They use a differentiable rasterizer. The degree of abstraction is controlled by the number of strokes. Don’t miss the amazing demo they created.

Yael is currently a PhD student at Tel Aviv University. Her research focus is on computer vision, machine learning, and computer graphics with a unique twist of combining art and technology. This work was done as part of her internship at EPFL. I met Yael at Israel’s Vision Day 2022. After she gave an amazing talk on this paper, I knew I wanted to host her on the podcast, and here we are today. In our conversation, I particularly liked her approach towards research and her aspiration to start a new field. I feel this should be the goal of any PhD student. Her artistic background, combined with computer science gives her a very special skillset in the communities landscape.

I am really looking forward to seeing what papers she will draw up next (pun intended).


Yael VinkerEhsan PajouheshgarJessica Y. BoRoman Bachmann, Amit Haim BermanoDaniel Cohen-OrAmir ZamirAriel Shamir



Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distil semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.


📚CLIP: Connecting Text and Images

📚Differentiable Vector Graphics Rasterization for Editing and Learning


📚 Paper

💻Project page

To stay up to date with Yael’s latest research, follow her on:

👨🏻‍🎓Personal page


👨🏻‍🎓Google Scholar


Recorded on February 1st 2023.


This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.



If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email:


🎧Subscribe on your favourite podcast app:

📧Subscribe to our mailing list:

🐦Follow us on Twitter:

🎥YouTube Channel: