Skip to main content

On March 1, the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2022, announced the list of accepted papers. The research paper Exploring Geometry Consistency for Monocular 3D Object Detection by the Kong University of Science and Technology Joint Laboratory was accepted at CVPR2022.

The work focuses on monocular 3D object detection which aims to detect 3D obstacles through only 2D monocular images. With the difficulty in inferring accurate depth from images, monocular 3D object detection task is an ill-posed and challenging task. To start with, the research analyzes how existing methods use visual cues to locate obstacles, and then proposes some data augmentation methods to enhance robustness.

As seen in the illustration above, visual cues that can be used by neural networks include the apparent size of the object in the image and its vertical position in the image (the closer the object is, the larger it is in the image, and the lower it appears  in the image). By apllying various perturbation to images, researchers found that the neural network tends to use the object size information to predict depth, but the network is not robust enough to use this information (as shown in the illustration below):

Motivated by the analysis, the researchers designed four different levels of the data augmentation methods to generate additional training data. The training data can effectively enhance neural network robustness to relevant visual cues, by ensuring the geometry consistency before and after image perturbation.To the best of our knowledge, no related data augmentation methods has been proposed before in the monocular 3D object detection field. Extensive experiments conducted on Kitti and nuScene datasets show the effectiveness of the proposed data augmentation methods.

The Kong University of Science and Technology Joint Laboratory, which was co-founded by and the Hong Kong University of Science and Technology, aims to give full play to both sides’ advantages, jointly promote machine learning technology industrialization, and give rise to innovative applications to improve environmental health in an autonomous way. Professor Zhang Tong, an internationally renowned scholar in the field of machine-learning, serves as the laboratory head. Professor Zhang Tong is currently a chair professor in the Department of Mathematics and the Department of Computer Science at Hong Kong University of Science and Technology. Previously, he held the positions of chief scientist at the Yahoo Research Institute, vice president and big data laboratory head at the Baidu Research Institute, and lab head at Tencent AI. He is also an ASA fellow and an IEEE fellow, and has served as the chari or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has been in the editorial boards of leading machine learning journals such as PAMI, JMLR, and Machine Learning Journal.

Online meeting between Mr. Huang Chao, CEO of, and Professor Zhang Tong, head of Joint Laboratory.

According to Professor Zhang Tong, “monocular camera plays an increasingly important role as a perception component in autonomous driving, and in recent years, relevant research has gradually drawn extensive attention to it, from both academic communities and industries.” Compared to using LIDAR, using monocular images to gain accurate depth information for 3D perception-based tasks lacks results, which makes these tasks extremely difficult. Based on the hypothesis that “the same obstacle’s 3D geometric characteristics should be consistent between mutltiple-view images,” the Joint Laboratory took the lead in designing data augmentation methods for monocular 3D object detection, achieving significant improvement. Furthermore, the exploration into geometry consistency will be an important research direction to improve the perception performance of monocular images. To do this, both sides will expand relevant technologies to more application scenarios. For example, they will jointly explore innovative applications for geometry consistency in multi-view data and promote this academic research for industrial usage.