PTAM
PTAM (Parallel Tracking Mapping) in 2007, the first V-SLAM system to process tracking and mapping in two parallel threads.
It was also the first to use a nonlinear optimization backend solution, laying the foundation for the backend processing of V-SLAM to be dominated by nonlinear optimization (the process is shown in Figure).
PTAM also proposed the Key-Frames mechanism; instead of carefully processing each image, several key images are strung together to optimize the trajectory and map. PTAM can place virtual objects on a virtual plane, and it contributed to combining AR (Augmented Reality) with SLAM.
ORB-SLAM
ORB-SLAM uses three-thread structure. It can realize the construction of sparse maps but can only satisfy the positioning demand; it cannot provide navigation, obstacle avoidance, or other functions.
The visual sensor used in ORB-SLAM in 2015 was a monocular camera, which had the problem of scale drift. Based on the shortcomings of ORB-SLAM, ORB-SLAM2 was proposed in 2016 as the first SLAM system for monocular, stereo, and RGB-D cameras. A thread was set up not to affect loop detection to execute a global BA (Bundle Adjustment). It contained a lightweight positioning model that used VO to track the unmapped areas and match map points to achieve zero-drift positioning. Figure shows ORB-SLAM2 keyframes and the visibility graph.
SVO-SLAM
The SVO-SLAM (Semi-direct Visual Odometry) algorithm uses semi-direct visual odometry without calculating a large number of descriptors and is extremely fast.
It can reach 300 frames per second on consumer laptops and 55 frames per second on a UAV (unmanned aerial vehicle).
SVO-SLAM first proposed the concept of a depth filter (as shown in Figure 26) to estimate the position of critical points and use the inverse depth as a parameterized form.
The schematic diagram of its effect is shown in Figure. The disadvantage of this method is that it discards the backend optimization and loop detection, has a cumulative error in the position estimation, and is difficult to relocate after loss.
ORB-SLAM uses three-thread structure. It can realize the construction of sparse maps but can only satisfy the positioning demand; it cannot provide navigation, obstacle avoidance, or other functions.
LSD-SLAM
LSD-SLAM proposes an image matching algorithm to estimate the similarity transformation between keyframes directly.
It does not need to extract the feature descriptor of the image, and the transformation between two frames can be obtained by optimizing the optical measurement error. The final result is a semi-dense map, which works better in places with weaker textures. The proposal of LSD-SLAM marks the transition from sparse maps to semi-dense maps, and the process is shown in Figure.
DSO-SLAM
DSO-SLAM (Direct Sparse Odometry S) was proposed and proved to be better than LSD-SLAM in terms of accuracy, stability, and speed.
The algorithm did not consider prior information and could directly optimize photometric error. The optimization range was not all frames but a sliding window formed by the most recent frame and the previous few.
In addition to perfecting the error model of direct method position estimation, DSO-SLAM also added an affine brightness transformation, photometric correction, and depth optimization.
Still, this method did not have loop detection. Its effect diagram is shown in Figure. The red lines inset illustrate a cycle the start and end position, visualizing the drift accumulated during with the tracked trajectory.