视觉SLAM笔记（46）基本的 VO

程序员文章站 2022-04-17 17:24:48

...

视觉SLAM笔记（46）基本的 VO

1. 特征提取和匹配
2. 两两帧的视觉里程计
3. 匹配特征点
4. 简单的检测

1. 特征提取和匹配

实现 VO，先来考虑特征点法。它 VO 任务是，根据输入的图像，计算相机运动和特征点位置
前面都讨论的是在两两帧间的位姿估计，然而发现仅凭两帧的估计是不够的
会把特征点缓存成一个小地图，计算当前帧与地图之间的位置关系
但那样程序会复杂一些
所以，让先订个小目标，暂时从两两帧间的运动估计出发

2. 两两帧的视觉里程计

现在只关心两个帧之间的运动估计，并且不优化特征点的位置
然而把估得的位姿“串”起来，也能得到一条运动轨迹
这种方式可以看成两两帧间的（Pairwise），无结构（Structureless）的 VO
两两帧之间的 VO 工作示意图如图所示：
视觉SLAM笔记（46）基本的 VO

实现起来最为简单，但是效果不佳

在这种 VO 里，定义了参考帧（Reference）和当前帧（Current）这两个概念
以参考帧为坐标系，把当前帧与它进行特征匹配，并估计运动关系

假设参考帧相对世界坐标的变换矩阵为 T_rw，当前帧与世界坐标系间为 T_cw
则待估计的运动与这两个帧的变换矩阵构成左乘关系：
视觉SLAM笔记（46）基本的 VO
在 t − 1 到 t 时刻，以 t − 1 为参考，求取 t 时刻的运动

这可以通过特征点匹配、光流或直接法得到，但这里只关心运动，不关心结构
换句话说，只要通过特征点成功求出了运动，就不再需要这帧的特征点了
这种做法当然会有缺陷，但是忽略掉数量庞大的特征点可以节省许多的计算量
然后，在 t 到 t + 1 时刻，又以 t 时刻为参考帧，考虑 t 到 t + 1 间的运动关系
如此往复，就得到了一条运动轨迹

3. 匹配特征点

这种 VO 的工作方式是简单的，不过实现也可以有若干种
以传统的匹配特征点——求 PnP 的方法为例实现一遍
在匹配特征点的方式中，最重要的参考帧与当前帧之间的特征匹配关系，它的流程可归纳如下：

视觉SLAM笔记（46）基本的 VO
VisualOdometry 类给出了上述算法的实现
VisualOdometry 类的头文件在/include/myslam/visual_odometry.h中：

#ifndef VISUALODOMETRY_H
#define VISUALODOMETRY_H

#include "myslam/common_include.h"
#include "myslam/map.h"

#include <opencv2/features2d/features2d.hpp>

namespace myslam
{
    class VisualOdometry
    {
    public:
        typedef shared_ptr<VisualOdometry> Ptr;
        enum VOState {
            INITIALIZING = -1,
            OK = 0,
            LOST
        };

        VOState     state_;     // 当前 VO 状态 
        Map::Ptr    map_;       // 映射所有帧和映射点
        Frame::Ptr  ref_;       // 参考坐标系
        Frame::Ptr  curr_;      // 当前帧

        cv::Ptr<cv::ORB> orb_;  // ORB 检测和计算器
        vector<cv::Point3f>     pts_3d_ref_;        // 参考坐标系中的三维点
        vector<cv::KeyPoint>    keypoints_curr_;    // 当前帧中的关键点
        Mat                     descriptors_curr_;  // 当前帧描述符
        Mat                     descriptors_ref_;   // 参考系描述符
        vector<cv::DMatch>      feature_matches_;   // 特征匹配

        SE3 T_c_r_estimated_;    // 当前帧的估计位姿
        int num_inliers_;        // pnp中输入点的数量
        int num_lost_;           // 丢失的数量

        // 参数
        int num_of_features_;   // 特征数
        double scale_factor_;   // 图像比例因子
        int level_pyramid_;     // 图像金字塔尺度
        float match_ratio_;     // 良好匹配率
        int max_num_lost_;      // 连续丢失的最大次数
        int min_inliers_;       // 最小内点数

        double key_frame_min_rot;   // 两个关键帧的最小旋转
        double key_frame_min_trans; // 两个关键帧的最小平移

    public: // 函数
        VisualOdometry();
        ~VisualOdometry();

        bool addFrame(Frame::Ptr frame);      // 添加帧

    protected:
        // 内部操作
        void extractKeyPoints();      // 提取关键点 
        void computeDescriptors();    // 计算描述子
        void featureMatching();       // 在上一帧的特征点3D坐标和当前的特征点2D坐标匹配
        void poseEstimationPnP();     // 姿势估计
        void setRef3DPoints();        // 设置参考帧的3D点
        void addKeyFrame();           // 添加关键帧
        bool checkEstimatedPose();    // 检查估计姿势
        bool checkKeyFrame();         // 检查关键帧

    };
}

#endif // VISUALODOMETRY_H

在源文件中，给出 VisualOdometry 方法的实现 /src/visual_odometry.cpp ：

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/calib3d/calib3d.hpp>
#include <algorithm>
#include <boost/timer.hpp>

#include "myslam/config.h"
#include "myslam/visual_odometry.h"

namespace myslam
{

    VisualOdometry::VisualOdometry() :
        state_(INITIALIZING), ref_(nullptr), curr_(nullptr), map_(new Map), num_lost_(0), num_inliers_(0)
    {
        num_of_features_ = Config::get<int>("number_of_features");
        scale_factor_ = Config::get<double>("scale_factor");
        level_pyramid_ = Config::get<int>("level_pyramid");
        match_ratio_ = Config::get<float>("match_ratio");
        max_num_lost_ = Config::get<float>("max_num_lost");
        min_inliers_ = Config::get<int>("min_inliers");
        key_frame_min_rot = Config::get<double>("keyframe_rotation");
        key_frame_min_trans = Config::get<double>("keyframe_translation");
        orb_ = cv::ORB::create(num_of_features_, scale_factor_, level_pyramid_);
    }

    VisualOdometry::~VisualOdometry()
    {

    }

    // 添加帧
    bool VisualOdometry::addFrame(Frame::Ptr frame)
    {
        switch (state_)
        {
        case INITIALIZING:
        {
            state_ = OK;
            curr_ = ref_ = frame;
            map_->insertKeyFrame(frame);
            // 从第一帧中提取特征 
            extractKeyPoints();
            computeDescriptors();
            // 在参考帧中计算特征的三维位置
            setRef3DPoints();
            break;
        }
        case OK:
        {
            curr_ = frame;
            extractKeyPoints();
            computeDescriptors();
            featureMatching();
            poseEstimationPnP();
            if (checkEstimatedPose() == true) // 一个好的评估
            {
                curr_->T_c_w_ = T_c_r_estimated_ * ref_->T_c_w_;  // T_c_w = T_c_r*T_r_w 
                ref_ = curr_;
                setRef3DPoints();
                num_lost_ = 0;
                if (checkKeyFrame() == true) // 关键帧？
                {
                    addKeyFrame();
                }
            }
            else // 由于种种原因造成的估计错误
            {
                num_lost_++;
                if (num_lost_ > max_num_lost_)
                {
                    state_ = LOST;
                }
                return false;
            }
            break;
        }
        case LOST:
        {
            cout << "vo has lost." << endl;
            break;
        }
        }

        return true;
    }

    // 提取关键点
    void VisualOdometry::extractKeyPoints()
    {
        orb_->detect(curr_->color_, keypoints_curr_);
    }

    // 计算描述子
    void VisualOdometry::computeDescriptors()
    {
        orb_->compute(curr_->color_, keypoints_curr_, descriptors_curr_);
    }

    // 在上一帧的特征点3D坐标和当前的特征点2D坐标匹配
    void VisualOdometry::featureMatching()
    {
        // 匹配desp_ref和desp_curr，使用OpenCV的蛮力匹配
        vector<cv::DMatch> matches;
        cv::BFMatcher matcher(cv::NORM_HAMMING);
        matcher.match(descriptors_ref_, descriptors_curr_, matches);
        // 选择最佳匹配
        float min_dis = std::min_element(
            matches.begin(), matches.end(),
            [](const cv::DMatch& m1, const cv::DMatch& m2)
        {
            return m1.distance < m2.distance;
        })->distance;

        feature_matches_.clear();
        for (cv::DMatch& m : matches)
        {
            if (m.distance < max<float>(min_dis*match_ratio_, 30.0))
            {
                feature_matches_.push_back(m);
            }
        }
        cout << "good matches: " << feature_matches_.size() << endl;
    }

    // 设置参考帧的3D点
    void VisualOdometry::setRef3DPoints()
    {
        // 选择特征的深度测量
        pts_3d_ref_.clear();
        descriptors_ref_ = Mat();
        for (size_t i = 0; i < keypoints_curr_.size(); i++)
        {
            double d = ref_->findDepth(keypoints_curr_[i]);
            if (d > 0)
            {
                Vector3d p_cam = ref_->camera_->pixel2camera(
                    Vector2d(keypoints_curr_[i].pt.x, keypoints_curr_[i].pt.y), d
                );
                pts_3d_ref_.push_back(cv::Point3f(p_cam(0, 0), p_cam(1, 0), p_cam(2, 0)));
                descriptors_ref_.push_back(descriptors_curr_.row(i));
            }
        }
    }

    // 姿势估计
    void VisualOdometry::poseEstimationPnP()
    {
        // 构建3d、2d观测
        vector<cv::Point3f> pts3d;
        vector<cv::Point2f> pts2d;

        for (cv::DMatch m : feature_matches_)
        {
            pts3d.push_back(pts_3d_ref_[m.queryIdx]);
            pts2d.push_back(keypoints_curr_[m.trainIdx].pt);
        }

        Mat K = (cv::Mat_<double>(3, 3) <<
            ref_->camera_->fx_, 0, ref_->camera_->cx_,
            0, ref_->camera_->fy_, ref_->camera_->cy_,
            0, 0, 1
            );
        Mat rvec, tvec, inliers;
        cv::solvePnPRansac(pts3d, pts2d, K, Mat(), rvec, tvec, false, 100, 4.0, 0.99, inliers);
        num_inliers_ = inliers.rows;
        cout << "pnp inliers: " << num_inliers_ << endl;
        T_c_r_estimated_ = SE3(
            SO3(rvec.at<double>(0, 0), rvec.at<double>(1, 0), rvec.at<double>(2, 0)),
            Vector3d(tvec.at<double>(0, 0), tvec.at<double>(1, 0), tvec.at<double>(2, 0))
        );
    }

    // 检查估计姿势
    bool VisualOdometry::checkEstimatedPose()
    {
        // 检查预估姿势是否正确
        if (num_inliers_ < min_inliers_)
        {
            cout << "reject because inlier is too small: " << num_inliers_ << endl;
            return false;
        }
        // 如果运动太大，它可能是错误的
        Sophus::Vector6d d = T_c_r_estimated_.log();
        if (d.norm() > 5.0)
        {
            cout << "reject because motion is too large: " << d.norm() << endl;
            return false;
        }
        return true;
    }

    // 检查关键帧
    bool VisualOdometry::checkKeyFrame()
    {
        Sophus::Vector6d d = T_c_r_estimated_.log();
        Vector3d trans = d.head<3>();
        Vector3d rot = d.tail<3>();
        if (rot.norm() > key_frame_min_rot || trans.norm() > key_frame_min_trans)
            return true;
        return false;
    }

    // 添加关键帧
    void VisualOdometry::addKeyFrame()
    {
        cout << "adding a key-frame" << endl;
        map_->insertKeyFrame(curr_);
    }
}

4. 简单的检测

由于各种原因，设计的上述 VO 算法，每一步都有可能失败
例如图片中不易提特征、特征点缺少深度值、误匹配、运动估计出错等等
因此，要设计一个鲁棒的 VO，必须（最好是显式地）考虑到上述所有可能出错的地方
那自然会使程序变得非常复杂

在 checkEstimatedPose 中，根据内点（inlier）的数量以及运动的大小做一个简单的检测：
认为内点不可太少，而运动不可能过大
最后，在 test 中加入该 VO 的测试程序，使用数据集观察估计的运动效果：

#include <fstream>
#include <boost/timer.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/viz.hpp> 

#include "myslam/config.h"
#include "myslam/visual_odometry.h"

int main(int argc, char** argv)
{
    if (argc != 2)
    {
        cout << "usage: run_vo parameter_file" << endl;
        return 1;
    }

    myslam::Config::setParameterFile(argv[1]);
    myslam::VisualOdometry::Ptr vo(new myslam::VisualOdometry);

    string dataset_dir = myslam::Config::get<string>("dataset_dir");
    cout << "dataset: " << dataset_dir << endl;
    ifstream fin(dataset_dir + "/associate.txt");
    if (!fin)
    {
        cout << "please generate the associate file called associate.txt!" << endl;
        return 1;
    }

    vector<string> rgb_files, depth_files;
    vector<double> rgb_times, depth_times;
    while (!fin.eof())
    {
        string rgb_time, rgb_file, depth_time, depth_file;
        fin >> rgb_time >> rgb_file >> depth_time >> depth_file;
        rgb_times.push_back(atof(rgb_time.c_str()));
        depth_times.push_back(atof(depth_time.c_str()));
        rgb_files.push_back(dataset_dir + "/" + rgb_file);
        depth_files.push_back(dataset_dir + "/" + depth_file);

        if (fin.good() == false)
            break;
    }

    myslam::Camera::Ptr camera(new myslam::Camera);

    // 可视化
    cv::viz::Viz3d vis("Visual Odometry");
    cv::viz::WCoordinateSystem world_coor(1.0), camera_coor(0.5);
    cv::Point3d cam_pos(0, -1.0, -1.0), cam_focal_point(0, 0, 0), cam_y_dir(0, 1, 0);
    cv::Affine3d cam_pose = cv::viz::makeCameraPose(cam_pos, cam_focal_point, cam_y_dir);
    vis.setViewerPose(cam_pose);

    world_coor.setRenderingProperty(cv::viz::LINE_WIDTH, 2.0);
    camera_coor.setRenderingProperty(cv::viz::LINE_WIDTH, 1.0);
    vis.showWidget("World", world_coor);
    vis.showWidget("Camera", camera_coor);

    cout << "read total " << rgb_files.size() << " entries" << endl;
    for (int i = 0; i < rgb_files.size(); i++)
    {
        Mat color = cv::imread(rgb_files[i]);
        Mat depth = cv::imread(depth_files[i], -1);
        if (color.data == nullptr || depth.data == nullptr)
            break;
        myslam::Frame::Ptr pFrame = myslam::Frame::createFrame();
        pFrame->camera_ = camera;
        pFrame->color_ = color;
        pFrame->depth_ = depth;
        pFrame->time_stamp_ = rgb_times[i];

        boost::timer timer;
        vo->addFrame(pFrame);
        cout << "VO costs time: " << timer.elapsed() << endl;

        if (vo->state_ == myslam::VisualOdometry::LOST)
            break;
        SE3 Tcw = pFrame->T_c_w_.inverse();

        // 显示地图和相机姿势
        cv::Affine3d M(
            cv::Affine3d::Mat3(
                Tcw.rotation_matrix()(0, 0), Tcw.rotation_matrix()(0, 1), Tcw.rotation_matrix()(0, 2),
                Tcw.rotation_matrix()(1, 0), Tcw.rotation_matrix()(1, 1), Tcw.rotation_matrix()(1, 2),
                Tcw.rotation_matrix()(2, 0), Tcw.rotation_matrix()(2, 1), Tcw.rotation_matrix()(2, 2)
            ),
            cv::Affine3d::Vec3(
                Tcw.translation()(0, 0), Tcw.translation()(1, 0), Tcw.translation()(2, 0)
            )
        );

        cv::imshow("image", color);
        cv::waitKey(1);
        vis.setWidgetPose("Camera", M);
        vis.spinOnce(1, false);
    }

    return 0;
}

使用 TUM 数据集的 fr1_xyz

视觉SLAM笔记（46）基本的 VO

下载之后，将解压后的文件夹放在程序所在路径/VSLAM_note/dataset/下
使用 associate.py 生成一个配对文件 associate.txt

$ python associate.py rgb.txt depth.txt > associate.txt

如图：
视觉SLAM笔记（46）基本的 VO
执行测试文件run_vo：

视觉SLAM笔记（46）基本的 VO
在演示程序中，可以看到当前帧的图像与它的估计位置
画出了世界坐标系的坐标轴（大）与当前帧的坐标轴（小）
颜色与轴的对应关系为：蓝色-Z，红色-X，绿色-Y

视觉SLAM笔记（46）基本的 VO
可以直观地感受到相机的运动，它大致与人类的感觉是相符的，尽管效果离预想还有一定的差距
还输出了 VO 单次计算的用时，在电脑上，大约能够以 14 多毫秒左右的速度运行
减少特征点数量可以提高运算速度
可以修改运行参数和数据集，看看它在各种情况下的表现

参考：

《视觉SLAM十四讲》

视觉SLAM笔记（46） 基本的 VO