根据人脸预测年龄性别和情绪代码实现（c++ + caffe）（四）

程序员文章站 2022-07-14 14:26:10

...

人脸面部情绪识别（一）

人脸面部情绪识别（二）

人脸面部情绪识别 age&gender（三）

根据人脸预测年龄性别和情绪代码实现（c++ + caffe）（四）

一、准备工作

1、caffe在windows下的环境搭建

环境搭建每个人由于系统环境不同而遇到的问题也不同，不要着急慢慢解决，我用的vs2015，所以只能自己编译caffe，最终出来一个build文件夹，里面有Caffe.sln，打开它把里面ALL_BUILD生成待用。

2、模板下载

解压后如下图：
根据人脸预测年龄性别和情绪代码实现（c++ + caffe）（四）

3、代码详解

examples/cpp_classification/classification.cpp

这是官方源码给的一个分类的例子，caffe中给出了分类的实例源代码，在初学时会调用生成的classification.exe对mnist手写字符图像进行分类。首先，用注释的方式对源码进行详细的说明。另外，这个例子用了类的概念且内容比较繁杂，需要改写成在实际测试中使用的方式。具体请参看源码详解。
caffe中用自带的classification.exe对单张图片进行分类识别时，一定要用到均值文件，写bat文件时也要写均值文件的路径。由于目标识别领域中用caffe时一般都有这个减均值的过程，将减均值过后的图片输入到第一个卷基层里，可以提高识别率。但是有些特殊领域不需要减均值这一步骤，比如图像取证中对某些特殊篡改后的图片进行训练时，输入到第一个卷基层的特征不是减均值过后的，比如是减去中值滤波过后的特征图片（MFR），因此这些领域用这个caffe自带的classification.exe时，就需要去掉均值这一部分。具体请参看详解

温馨提示：到此为止，生成classification.exe之后，输入参数就可以出来结果了，想多学继续往下玩，不想多动手的也可以去给的下载链接下载

convert_imageset.cpp （Caffe中的图像转换工具）

// This program converts a set of images to a lmdb/leveldb by storing them
// as Datum proto buffers.
// Usage:
//   convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME
//
// where ROOTFOLDER is the root folder that holds all the images, and LISTFILE
// should be a list of files as well as their labels, in the format as
//   subfolder1/file1.JPEG 7
//   ....

#include <algorithm>
#include <fstream>  // NOLINT(readability/streams)
#include <string>
#include <utility>
#include <vector>

#include "boost/scoped_ptr.hpp"
#include "gflags/gflags.h"
#include "glog/logging.h"

#include "caffe/proto/caffe.pb.h"
#include "caffe/util/db.hpp"
#include "caffe/util/format.hpp"
#include "caffe/util/io.hpp"
#include "caffe/util/rng.hpp"

using namespace caffe;  // NOLINT(build/namespaces)
using std::pair;
using boost::scoped_ptr;

DEFINE_bool(gray, false,
    "When this option is on, treat images as grayscale ones");
DEFINE_bool(shuffle, false,
    "Randomly shuffle the order of images and their labels");
DEFINE_string(backend, "lmdb",
        "The backend {lmdb, leveldb} for storing the result");
DEFINE_int32(resize_width, 0, "Width images are resized to");
DEFINE_int32(resize_height, 0, "Height images are resized to");
DEFINE_bool(check_size, false,
    "When this option is on, check that all the datum have the same size");
DEFINE_bool(encoded, false,
    "When this option is on, the encoded image will be save in datum");
DEFINE_string(encode_type, "",
    "Optional: What type should we encode the image as ('png','jpg',...).");

int main(int argc, char** argv) {
#ifdef USE_OPENCV
  ::google::InitGoogleLogging(argv[0]);
  // Print output to stderr (while still logging)
  FLAGS_alsologtostderr = 1;

#ifndef GFLAGS_GFLAGS_H_
  namespace gflags = google;
#endif

  gflags::SetUsageMessage("Convert a set of images to the leveldb/lmdb\n"
        "format used as input for Caffe.\n"
        "Usage:\n"
        "    convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME\n"
        "The ImageNet dataset for the training demo is at\n"
        "    http://www.image-net.org/download-images\n");
  gflags::ParseCommandLineFlags(&argc, &argv, true);

  if (argc < 4) {
    gflags::ShowUsageWithFlagsRestrict(argv[0], "tools/convert_imageset");
    return 1;
  }

  const bool is_color = !FLAGS_gray;
  const bool check_size = FLAGS_check_size;
  const bool encoded = FLAGS_encoded;
  const string encode_type = FLAGS_encode_type;

  std::ifstream infile(argv[2]);
  std::vector<std::pair<std::string, int> > lines;
  std::string line;
  size_t pos;
  int label;
  while (std::getline(infile, line)) {
    pos = line.find_last_of(' ');
    label = atoi(line.substr(pos + 1).c_str());
    lines.push_back(std::make_pair(line.substr(0, pos), label));
  }
  if (FLAGS_shuffle) {
    // randomly shuffle data
    LOG(INFO) << "Shuffling data";
    shuffle(lines.begin(), lines.end());
  }
  LOG(INFO) << "A total of " << lines.size() << " images.";

  if (encode_type.size() && !encoded)
    LOG(INFO) << "encode_type specified, assuming encoded=true.";

  int resize_height = std::max<int>(0, FLAGS_resize_height);
  int resize_width = std::max<int>(0, FLAGS_resize_width);

  // Create new DB
  scoped_ptr<db::DB> db(db::GetDB(FLAGS_backend));
  db->Open(argv[3], db::NEW);
  scoped_ptr<db::Transaction> txn(db->NewTransaction());

  // Storing to db
  std::string root_folder(argv[1]);
  Datum datum;
  int count = 0;
  int data_size = 0;
  bool data_size_initialized = false;

  for (int line_id = 0; line_id < lines.size(); ++line_id) {
    bool status;
    std::string enc = encode_type;
    if (encoded && !enc.size()) {
      // Guess the encoding type from the file name
      string fn = lines[line_id].first;
      size_t p = fn.rfind('.');
      if ( p == fn.npos )
        LOG(WARNING) << "Failed to guess the encoding of '" << fn << "'";
      enc = fn.substr(p);
      std::transform(enc.begin(), enc.end(), enc.begin(), ::tolower);
    }
    status = ReadImageToDatum(root_folder + lines[line_id].first,
        lines[line_id].second, resize_height, resize_width, is_color,
        enc, &datum);
    if (status == false) continue;
    if (check_size) {
      if (!data_size_initialized) {
        data_size = datum.channels() * datum.height() * datum.width();
        data_size_initialized = true;
      } else {
        const std::string& data = datum.data();
        CHECK_EQ(data.size(), data_size) << "Incorrect data field size "
            << data.size();
      }
    }
    // sequential
    string key_str = caffe::format_int(line_id, 8) + "_" + lines[line_id].first;

    // Put in db
    string out;
    CHECK(datum.SerializeToString(&out));
    txn->Put(key_str, out);

    if (++count % 1000 == 0) {
      // Commit db
      txn->Commit();
      txn.reset(db->NewTransaction());
      LOG(INFO) << "Processed " << count << " files.";
    }
  }
  // write the last batch
  if (count % 1000 != 0) {
    txn->Commit();
    LOG(INFO) << "Processed " << count << " files.";
  }
#else
  LOG(FATAL) << "This tool requires OpenCV; compile with USE_OPENCV.";
#endif  // USE_OPENCV
  return 0;
}

应用方法：
该工具通过命令行方式使用，命令行的格式如下：
convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME
其中DB_NAME后面还可以跟一些可选的参数设置，具体有哪些可选的参数参见“可选参数设置部分”
其中ROOTFOLDER为图像集的根目录
LISTFILE 为一个文件的路径，该文件中记录了图像集中的各图样的路径和相应的标注
DB_NAME为要生成的数据库的名字

  举个例子：
  convert_imageset ImgSetRootDir/ ImgFileList.txt imgSet.lmdb
  其中ImgFileList.txt（也即LISTFILE）的没一行给出一个图像的信息，如：subfolder1/file1.JPEG 7
  其中subfolder1/file1.JPEG为图像路径，7为该图像的类别，并且中间空一个空格

可选参数设置
gray：bool类型，默认为false，如果设置为true，则代表将图像当做灰度图像来处理，否则当做彩色图像来处理
shuffle：bool类型，默认为false，如果设置为true，则代表将图像集中的图像的顺序随机打乱
backend：string类型，可取的值的集合为{“lmdb”, “leveldb”}，默认为”lmdb”，代表采用何种形式来存储转换后的数据
resize_width：int32的类型，默认值为0，如果为非0值，则代表图像的宽度将被resize成resize_width
resize_height：int32的类型，默认值为0，如果为非0值，则代表图像的高度将被resize成resize_height
check_size：bool类型，默认值为false，如果该值为true，则在处理数据的时候将检查每一条数据的大小是否相同
encoded：bool类型，默认值为false，如果为true，代表将存储编码后的图像，具体采用的编码方式由参数encode_type指定
encode_type：string类型，默认值为”“，用于指定用何种编码方式存储编码后的图像，取值为编码方式的后缀（如’png’,’jpg’,…)

带参数的命令：
convert_imageset ImgSetRootDir/ ImgFileList.txt imgSet.lmdb --gray=true --resize_width=160 --resize_height=160

compute_image_mean.cpp (计算图片数据的均值)

二、caffe网络结构以及如何产生去均值文件

根据人脸预测年龄性别和情绪代码实现（c++ + caffe）（四）

本文执行create_iamgenet.sh脚本的时候，不加–gray=true参数时，能够生成lmdb，产生的是3通道的，而表情识别的net输入数据是1通道的，所有得再生成一个1通道的去均值文件，方便最后使用。最后的解决办法是直接用上面生成的convert_imageset.exe实现。例如我的：convert_imageset E:\test\faceR\data\train\ E:\test\faceR\data\train.txt E:\test\faceR\test –gray=true。这样就能生成我需要的1通道。最终产生的mean.binnryproto文件就是我需要的一通道去均值文件。

注意：三通道对应的均值（104，117，123）。一通道对应的均值（129.74）。是根据fer2013数据集计算所得。所以也可以改代码不需要mean.binnryproto文件，最后直接减去均值也可以，至于怎么改代码，上面提到了。