We propose a novel RGB-D camera tracking system that robustly reconstructs hand-held RGB-D camera sequences. The robustness of our system is achieved by two independent features of our method: adaptive visual odometry (VO) and integer programming-based key-frame selection. Our VO method adaptively interpolates the camera motion results of the direct VO (DVO) and the iterative closed point (ICP) to yield more optimal results than existing methods such as Elastic- Fusion. Moreover, our key-frame selection method locates globally optimum key-frames using a comprehensive objective function in a deterministic manner rather than heuristic or experience-based rules that prior methods mostly rely on. As a result, our method can complete reconstruction even if the camera fails to be tracked due to discontinuous camera motions, such as kidnap events, when conventional systems need to backtrack the scene. We validated our tracking system on 25 TUM benchmark sequences against state-of-the-art works, such as ORBSLAM2, Elastic-Fusion, and DVO SLAM, and experimentally showed that our method has smaller and more robust camera trajectory errors than these systems.


  • Adaptive VO method using a novel pose estimation formulation.
  • Novel integer programming-based formulation for optimal key-frame selection.
  • More robust results compared to the state-of-the art VO and SLAM systems.

Our Method


Adaptive VO

Since it is known that ICP and DVO show different characteristic behavior depending on the availability of textures or depth structures in an input scenario, our algorithm leverages this fact by adaptively adjusting the weight parameter according to the relative fitness between DVO and ICP at each IRLS iteration. As a result, our VO algorithm can generate robust and accurate results.

Semi dense image matching

We build an affinity matrix by executing multiple wide baseline matchings for each incoming frame. To speed up building the affinity matrix, we performed a BoW match before putative matching to locate possible image pairs. The possible image pairs further verified by CudaSIFT to complete the matching task. Here, we run the image/feature matching task as an independent thread in parallel to our VO thread to alleviate the time-consuming problem of this part of the system.

Keyframe selection

Once the affinity matrix is constructed, we proceed to the set covering procedure in order to select keyframes. Our goal is to select a minimal set of vertices such that the selected vertices can cover the entire vertex set V . To this end, we locate key-vertices using set covering procedure. Then, we further locate bridging vertices among key-vertices in order to prevent the system from generating multiple disjunct sets.


VO results

We carried out extensive evaluation of our method on TUM sequences. The results are compared against 5 different baseline methods: ICP, DVO, FOVIS, Whel13, Park17 in terms of relative pose error (RPE) metric.

Full system results

To assess our full system, we compared our method against ORBSLAM2, DVOSLAM, and Elastic Fusion. We measured absolute trajectory error (ATE) on 25 TUM benchmark sequences.

Qualitative Results


Contact Info

Ewha Graphics Lab
Department of Computer Science & Engineering, Ewha Womans University
  52, Ewhayeodae-gil, Seodaemun-gu, Seoul, Korea, 03760

  Kyung M. Han,
  Young J. Kim,