Data-Driven 3D Primitives for Single Image Understanding
How do you infer the 3D properties of the world from a 2D image? This question has intrigued researchers in psychology and computer vision for decades. Over the years, researchers have proposed many theories to explain how the brain can recover rich information about the 3D world from a single 2D projection. While there is agreement on many of the cues and constraints involved (e.g., texture gradient and planarity), recovering the 3D structure of the world from a single image is still an enormously difficult and unsolved problem.At the heart of the 3D inference problem is the question: What are the right primitives (representations) for inferring the 3D world from a 2D image? It is not clear what kind of 3D primitives can be directly detected in images and be used for subsequent 3D reasoning. There is a rich literature proposing a myriad of 3D primitives ranging from edges and surfaces to volumetric primitives such as generalized cylinders, geons and cuboids. While these 3D primitives make sense intuitively, they are often hard to detect because they are not discriminative in appearance. On the other hand, primitives based on appearance might be easy to detect but can be geometrically uninformative.
They propose geometric primitives which are visually-discriminative, or easily recognized in a scene, and geometrically-informative, or conveying information about the 3D world when recognized.