I currently work at DISDAR, a startup-company I co-founded that develops innovative machine-learning techniques for extracting information out of large unstructured datasets.
I obtained both my PhD and my Diplom Informatiker (M.Sc.) degree from Technische Universität Berlin and a M.Eng. degree from Shanghai Jiao Tong University. From 2007 - 2012 I have been a member of the the Computer Graphics Group supervised by Marc Alexa. From March - May 2011 I have been visiting Brown University working with James Hays. From May - July 2012 I have been visiting Massachusetts Institute of Technology working in Fredo Durand's group.
My research interests span Computer Graphics, Computer Vision and Machine Learning. In particular, I focus on large-scale sketch-based visual retrieval, synthesis and recognition.
Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available to all humans. This paper is the first large scale exploration of human sketches. We analyze the distribution of non-expert sketches of everyday objects such as ‘teapot’ or ‘car’. We ask humans to sketch objects of a given category and gather 20,000 unique sketches evenly distributed over 250 object categories. With this dataset we perform a perceptual study and find that humans can correctly identify the object category of a sketch 73% of the time. We compare human performance against computational recognition methods. We develop a bag-of-features sketch representation and use multi-class support vector machines, trained on our sketch dataset, to classify sketches. The resulting recognition method is able to identify unknown sketches with 56% accuracy (chance is 0.4%). Based on the computational model, we demonstrate an interactive sketch recognition system. We release the complete crowd-sourced dataset of sketches to the community.
We develop a system for 3D object retrieval based on sketched feature lines as input. For objective evaluation, we collect a large number of query sketches from human users that are related to an existing data base of objects. The sketches turn out to be generally quite abstract with large local and global deviations from the original shape. Based on this observation, we decide to use a bag-of-features approach over computer generated line drawings of the objects. We develop a targeted feature transform based on Gabor filters for this system. We can show objectively that this transform is better suited than other approaches from the literature developed for similar tasks. Moreover, we demonstrate how to optimize the parameters of our, as well as other approaches, based on the gathered sketches. In the resulting comparison, our approach is significantly better than any other system described so far.
We present ongoing work on object category recognition from binary human outline sketches. We first define a novel set of 187 “sketchable” object categories by extracting the labels of the most frequent objects in the LabelMe dataset. In a large-scale experiment, we then gather a dataset of over 5,500 human sketches, evenly distributed over all categories. We show that by training multi-class support vector machines on this dataset, we can classify novel sketches with high accuracy. We demonstrate this in an interactive sketching application that progressively updates its category prediction as users add more strokes to a sketch.
We introduce Photosketcher, an interactive system for progressively synthesizing novel images using only sparse user sketches as the input. Compared to existing approaches for synthesising images from parts of other images, Photosketcher works on the image content exclusively, no keywords or other metadata associated with the images is required. Users sketch the rough shape of a desired image part and we automatically search a large collection of images for images containing that part. The search is based on a bag-of-features approach using local descriptors for translation invariant part retrieval. The compositing step again is based on user scribbles: from the scribbles we predict the desired part using Gaussian Mixture Models and compute an optimal seam using Graphcut. Optionally, Photosketcher lets users blend the composite image in the gradient domain to further reduce visible seams. We demonstrate that the resulting system allows interactive generation of complex images.
We introduce a benchmark for evaluating the performance of large scale sketch-based image retrieval systems. The necessary data is acquired in a controlled user study where subjects rate how well given sketch/image pairs match. We suggest how to use the data for evaluating the performance of sketch-based image retrieval systems. The benchmark data as well as the large image database are made publicly available for further studies of this type. Furthermore, we develop new descriptors based on the bag-of-features approach and use the benchmark to demonstrate that they significantly outperform other descriptors in the literature.
We address the problem of fast, large scale sketch-based image retrieval, searching in a database of over one million images. We show that current retrieval methods do not scale well towards large databases in the context of interactively supervised search and propose two different approaches for which we objectively evaluate that they significantly outperform existing approaches. The proposed descriptors are constructed such that both the full color image and the sketch undergo exactly the same preprocessing steps. We first search for an image with similar structure, analyzing gradient orientations. Then, best matching images are clustered based on dominant color distributions, to offset the lack of color-based decision during the initial search. Overall, the query results demonstrate that the system offers intuitive access to large image databases using a user-friendly sketch-and-browse interface.
As large collections of 3D models are starting to become as common as public image collections, the need arises to quickly locate models in such collections. Models are often insufficiently annotated such that a keyword based search is not promising. Our approach for content based searching of 3D models relies entirely on visual analysis and is based on the observation that a large part of our perception of shapes stems from their salient features, usually captured by dominant lines in their display. Recent research on such feature lines has shown that 1) people mostly draw the same lines when asked to depict a certain model and 2) the shape of an object is well represented by the set of feature lines generated by recent NPR line drawing algorithms. Consequently, we suggest an image based approach for 3D shape retrieval, exploiting the similarity of human sketches and the results of current line drawing algorithms. Our search engine takes a sketch of the desired model drawn by a user as the input and compares this sketch to a set of line drawings automatically generated for each of the models in the collection.
In this paper, we propose a novel method for interactively browsing large image collections, making the user an integral part of the interactive exploration by repeatedly exploiting the amazing ability of humans to quickly identify relevant images from a large set. The method requires only minimal input effort: users simply point to the image in the current display that seems most attractive to them. The system then assembles a representative set of other images from the collection that are likely in its perceptual vicinity. The resulting browsing approach is -- even for novice users -- extremely simple to use and enables an interactive exploration of the collection as well as target-oriented selection towards a specific mental image model.
We address the problem of large scale sketch based image retrieval, searching in a database of over a million images. The search is based on a descriptor that elegantly addresses the asymmetry between the binary user sketch on the one hand and the full color image on the other hand. The proposed descriptor is constructed such that both the full color image and the sketch undergo exactly the same preprocessing steps. We also design an adapted version of the descriptor proposed for MPEG-7 and compare their performance on a database of 1.5 million images. Best matching images are clustered based on color histograms, to offset the lacking color in the query. Overall, the query results demonstrate that the system allows users an intuitive access to large image databases.
We introduce a system for progressively creating images through a simple sketching and compositing interface. A large database of over 1.5 million images is searched for matches to a user's binary outline sketch; the results of this search can be combined interactively to synthesize the desired image. We introduce image descriptors for the task of estimating the difference between images and binary outline sketches. The compositing part is based on graph cut and Poisson blending. We demonstrate that the resulting system allows generating complex images in an intuitive way.
We present an image editing tool that allows to deform and composite image regions using an intuitive sketch-based interface. Users simply draw the outline of the source image region and sketch a new boundary shape onto the location where this region is to be pasted. The deformation optimizes a shape distortion energy, and we use Poisson cloning for subsequent compositing. Since the correspondence between the source and target boundary curves is not known a priori, we propose an alternating optimization process that interleaves minimization of the deformation energy w.r.t. the image region interior and the mapping between the boundary curves, thus automatically determining the boundary correspondence and deforming the image at the same time. Thanks to the particular design of the deformation energy, its gradient can be efficiently computed, making the entire image editing framework interactive and convenient for practical use.
We present a new, efficient and easy to use collision detection scheme for real-time collision detection between highly deformable tetrahedral models. Tetrahedral models are a common representation of volumetric meshes which are often used in physically based simulations, e.g. in Virtual surgery. In a deformable models environment collision detection usually is a performance bottleneck since the data structures used for efficient intersection tests need to be rebuilt or modified frequently. Our approach minimizes the time needed for building a collision detection data structure. We employ an infinite hierarchical spatial grid in which for each single tetrahedron in the scene a well fitting grid cell size is computed. A hash function is used to project occupied grid cells into a finite ID hash table. Only primitives mapped to the same hash index indicate a possible collision and need to be checked for intersections. This results in a high performance collision detection algorithm which does not depend on user defined parameters and thus flexibly adapts to any scene setup.
During winter term 2006, I enjoyed attending a Computational Photography seminar held by Marc Alexa and Olga Sorkine. I gave a talk about Bilateral Filtering and did an implementation project in the area of High Dynamic Range Imaging