R语言学习Rcpp基础知识全面整理

程序员文章站 2022-07-04 18:23:00

目录1. 相关配置和说明2. 常用数据类型3. 常用数据类型的建立4. 常用数据类型元素访问5. 成员函数6. 语法糖6.1 算术和逻辑运算符6.2. 常用函数7. stl7.1. 迭代器7.2. 算...

1. 相关配置和说明

由于dirk的书seamless r and c++ integration with rcpp是13年出版的，当时rcpp attributes这一特性还没有被cran批准，所以当时调用和编写rcpp函数还比较繁琐。rcpp attributes（2016）极大简化了这一过程(“provides an even more direct connection between c++ and r”)，保留了内联函数，并提供了sourcecpp函数用于调用外部的.cpp文件。换句话说，我们可以将某c++函数存在某个.cpp文件中，再从r脚本文件中，像使用source一样，通过sourcecpp来调用此c++函数。

例如，在r脚本文件中，我们希望调用名叫test.cpp文件中的函数，我们可以采用如下操作：

library(rcpp)
sys.setenv("pkg_cxxflags"="-std=c++11")
sourcecpp("test.cpp")

其中第二行的意思是使用c++11的标准来编译文件。

在test.cpp文件中, 头文件使用rcpp.h，需要输出到r中的函数放置在//[[rcpp::export]]之后。如果要输出到r中的函数需要调用其他c++函数，可以将这些需要调用的函数放在//[[rcpp::export]]之前。

#include <rcpp.h>
using namespace rcpp;
//[[rcpp::export]]

为进行代数计算，rcpp提供了rcpparmadillo和rcppeigen。如果要使用此包，需要在函数文件开头注明依赖关系，例如// [[rcpp::depends(rcpparmadillo)]]，并载入相关头文件：

// [[rcpp::depends(rcpparmadillo)]]
#include <rcpparmadillo.h>
#include <rcpp.h>
using namespace rcpp;
using namespace arma;
// [[rcpp::export]]

c++的基本知识可以参见此处。

2. 常用数据类型

关键字	描述
int/double/bool/string/auto	整数型/数值型/布尔值/字符型/自动识别(c++11)
integervector	整型向量
numericvector	数值型向量(元素的类型为double)
complexvector	复数向量 not sure
logicalvector	逻辑型向量； r的逻辑型变量可以取三种值：true, false, na；而c++布尔值只有两个,true or false。如果将r的na转化为c++中的布尔值，则会返回true。
charactervector	字符型向量
expressionvector	vectors of expression types
rawvector	vectors of type raw
integermatrix	整型矩阵
numericmatrix	数值型矩阵(元素的类型为double)
logicalmatrix	逻辑型矩阵
charactermatrix	字符矩阵
list aka genericvector	列表；lists;类似于r中列表，其元素可以使任何数据类型
dataframe	数据框；data frames；在rcpp内部，数据框其实是通过列表实现的
function	函数型
environment	环境型；可用于引用r环境中的函数、其他r包中的函数、操作r环境中的变量
robject	可以被r识别的类型

注释：

某些r对象可以通过as<some_rcppobject>(some_robject)转化为转化为rcpp对象。例如:
在r中拟合一个线性模型（其为list），并将其传入c++函数中

>mod=lm(y~x);

numericvector resid = as<numericvector>(mod["residuals"]);
numericvector fitted = as<numericvector>(mod["fitted.values"]);

可以通过as<some_stl_vector>(some_rcppvector)，将numericvector转换为std::vector。例如：

std::vector<double> vec;
vec = as<std::vector<double>>(x);

在函数中，可以用wrap()，将std::vector转换为numericvector。例如：

arma::vec long_vec(16,arma::fill::randn);
vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
numericvector output = wrap(long_vec2);

在函数返回时，可以使用wrap()，将c++ stl类型转化为r可识别类型。示例见后面输入和输出示例部分。

以上数据类型除了environment之外（function不确定），大多可直接作为函数返回值，并被自动转化为r对象。

算数和逻辑运算符号+, -, *, /, ++, --, pow(x,p), <, <=, >, >=, ==, !=。逻辑关系符号&&, ||, !。

3. 常用数据类型的建立

//1. vector
numericvector v1(n);//创立了一个长度为n的默认初始化的数值型向量v1。
numericvector v2=numericvector::create(1, 2, 3); //创立了一个数值型向量v2，并初始化使其含有三个数1，2，3。
logicalvector v3=logicalvector::create(true,false,r_nan);//创立了一个逻辑型变量v3。如果将其转化为r object，则其含有三个值true, false, na。
//2. matrix
numericmatrix m1(nrow,ncol);//创立了一个nrow*ncol的默认初始化的数值型矩阵。
//3. multidimensional array
numericvector out=numericvector(dimension(2,2,3));//创立了一个多维数组。然而我不知道有什么卵用。。
//4. list
numericmatrix y1(2,2);
numericvector y2(5);
list l=list::create(named("y1")=y1,
                    named("y2")=y2);

//5. dataframe
numericvector a=numericvector::create(1,2,3);
charactervector b=charactervector::create("a","b","c");
std::vector<std::string> c(3);
c[0]="a";c[1]="b";c[2]="c";
dataframe df=dataframe::create(named("col1")=a,
                               named("col2")=b,
                               named("col3")=c);

4. 常用数据类型元素访问

元素访问	描述
[n]	对于向量类型或者列表，访问第n个元素。对于矩阵类型，首先把矩阵的下一列接到上一列之下，从而构成一个长列向量，并访问第n个元素。不同于r，n从0开始。
(i,j)	对于矩阵类型，访问第(i,j)个元素。不同于r，i和j从0开始。不同于向量，此处用圆括号。
list["name1"]/dataframe["name2"]	访问list中名为name1的元素/访问dataframe中，名为name2的列。

5. 成员函数

成员函数	描述
x.size()	返回x的长度；适用于向量或者矩阵，如果是矩阵，则先向量化
x.push_back(a)	将a添加进x的末尾；适用于向量
x.push_front(b)	将b添加进x的开头；适用于向量
x.ncol()	返回x的列数
x.nrow()	返回x的行数

6. 语法糖

6.1 算术和逻辑运算符

+, -, *, /, pow(x,p), <, <=, >, >=, ==, !=, !

以上运算符均可向量化。

6.2. 常用函数

is.na()
produces a logical sugar expression of the same length. each element of the result expression evaluates to true if the corresponding input is a missing value, or false otherwise.

seq_len()
seq_len( 10 ) will generate an integer vector from 1 to 10 (note: not from 0 to 9), which is very useful in conjugation withsapply() and lapply().

pmin(a,b) and pmax(a,b)
a and b are two vectors. pmin()(or pmax()) compares the i <script type="math/tex" id="mathjax-element-1">i</script>th elements of a and b and return the smaller (larger) one.

ifelse()
ifelse( x > y, x+y, x-y ) means if x>y is true, then do the addition; otherwise do the subtraction.

sapply()
sapply applies a c++ function to each element of the given expression to create a new expression. the type of the resulting expression is deduced by the compiler from the result type of the function.

the function can be a free c++ function such as the overload generated by the template function below:

template <typename t>
t square( const t& x){
    return x * x ;
}
sapply( seq_len(10), square<int> ) ;

alternatively, the function can be a functor whose type has a nested type called result_type

template <typename t>
struct square : std::unary_function<t,t> {
    t operator()(const t& x){
    return x * x ;
    }
}
sapply( seq_len(10), square<int>() ) ;

lappy()
lapply is similar to sapply except that the result is allways an list expression (an expression of type vecsxp).

sign()

其他函数

数学函数: abs(), acos(), asin(), atan(), beta(), ceil(), ceiling(), choose(), cos(), cosh(), digamma(), exp(), expm1(), factorial(), floor(), gamma(), lbeta(), lchoose(), lfactorial(), lgamma(), log(), log10(), log1p(), pentagamma(), psigamma(), round(), signif(), sin(), sinh(), sqrt(), tan(), tanh(), tetragamma(), trigamma(), trunc().
汇总函数: mean(), min(), max(), sum(), sd(), and (for vectors) var()
返回向量的汇总函数: cumsum(), diff(), pmin(), and pmax()
查找函数: match(), self_match(), which_max(), which_min()
重复值处理函数: duplicated(), unique()

7. stl

rcpp可以使用c++的标准模板库stl中的数据结构和算法。rcpp也可以使用boost中的数据结构和算法。

7.1. 迭代器

此处仅仅以一个例子代替，详细参见c++ primer，或者此处。

#include <rcpp.h>
using namespace rcpp;
// [[rcpp::export]]
double sum3(numericvector x) {
  double total = 0;
  numericvector::iterator it;
  for(it = x.begin(); it != x.end(); ++it) {
    total += *it;
  }
  return total;
}

7.2. 算法

头文件<algorithm>中提供了许多的算法（可以和迭代器共用），具体可以参见此处。

for example, we could write a basic rcpp version of findinterval() that takes two arguments a vector of values and a vector of breaks, and locates the bin that each x falls into.

#include <algorithm>
#include <rcpp.h>
using namespace rcpp;
// [[rcpp::export]]
integervector findinterval2(numericvector x, numericvector breaks) {
  integervector out(x.size());
  numericvector::iterator it, pos;
  integervector::iterator out_it;
  for(it = x.begin(), out_it = out.begin(); it != x.end(); 
      ++it, ++out_it) {
    pos = std::upper_bound(breaks.begin(), breaks.end(), *it);
    *out_it = std::distance(breaks.begin(), pos);
  }
  return out;
}

7.3. 数据结构

stl所提供的数据结构也是可以使用的，rcpp知道如何将stl的数据结构转换成r的数据结构，所以可以从函数中直接返回他们，而不需要自己进行转换。
具体请参考此处。

7.3.1. vectors

详细信息请参见处此

创建
vector<int>, vector<bool>, vector<double>, vector<string>

元素访问
利用标准的[]符号访问元素

元素增加
利用.push_back()增加元素。

存储空间分配
如果事先知道向量长度，可用.reserve()分配足够的存储空间。

例子：

the following code implements run length encoding (rle()). it produces two vectors of output: a vector of values, and a vector lengths giving how many times each element is repeated. it works by looping through the input vector x comparing each value to the previous: if it's the same, then it increments the last value in lengths; if it's different, it adds the value to the end of values, and sets the corresponding length to 1.

#include <rcpp.h>
using namespace rcpp;

// [[rcpp::export]]
list rlec(numericvector x) {
  std::vector<int> lengths;
  std::vector<double> values;

  // initialise first value
  int i = 0;
  double prev = x[0];
  values.push_back(prev);
  lengths.push_back(1);

  numericvector::iterator it;
  for(it = x.begin() + 1; it != x.end(); ++it) {
    if (prev == *it) {
      lengths[i]++;
    } else {
      values.push_back(*it);
      lengths.push_back(1);

      i++;
      prev = *it;
    }
  }
  return list::create(
    _["lengths"] = lengths, 
    _["values"] = values
  );
}

7.3.2. sets

参见链接1，链接2和链接3。

stl中的集合std::set不允许元素重复，而std::multiset允许元素重复。集合对于检测重复和确定不重复的元素具有重要意义((like unique, duplicated, or in))。

ordered set: std::set和std::multiset。

unordered set: std::unordered_set
一般而言unordered set比较快，因为它们使用的是hash table而不是tree的方法。
unordered_set<int>, unordered_set<bool>, etc

7.3.3. maps

与table()和match()关系密切。

ordered map: std::map

unordered map: std::unordered_map

since maps have a value and a key, you need to specify both types when initialising a map:

map<double, int>, unordered_map<int, double>.

8. 与r环境的互动

通过environmentrcpp可以获取当前r全局环境(global environment)中的变量和载入的函数，并可以对全局环境中的变量进行修改。我们也可以通过environment获取其他r包中的函数，并在rcpp中使用。

获取其他r包中的函数

rcpp::environment stats("package:stats");
rcpp::function rnorm = stats["rnorm"];
return rnorm(10, rcpp::named("sd", 100.0));

获取r全局环境中的变量并进行更改
假设r全局环境中有一个向量x=c(1,2,3)，我们希望在rcpp中改变它的值。

rcpp::environment global = rcpp::environment::global_env();//获取全局环境并赋值给environment型变量global
rcpp::numericvector tmp = global["x"];//获取x
tmp=pow(tmp,2);//平方
global["x"]=tmp;//将新的值赋予到全局环境中的x

获取r全局环境中的载入的函数
假设全局环境中有r函数funr，其定义为：

x=c(1,2,3);
funr<-function(x){
  return (-x);
}

并有r变量x=c(1,2,3)。我们希望在rcpp中调用此函数并应用在向量x上。

#include <rcpp.h>
using namespace rcpp;
// [[rcpp::export]]
numericvector func() {
  rcpp::environment global =
    rcpp::environment::global_env();
  rcpp::function funrinc = global["funr"];
  rcpp::numericvector tmp = global["x"];
  return funrinc(tmp);
}

9. 用rcpp创建r包

见此文

利用rcpp和rcpparmadillo创建r包

10. 输入和输出示例

如何传递数组

如果要传递高维数组，可以将其存为向量，并附上维数信息。有两种方式：

通过.attr("dim")设置维数

numericvector可以包含维数信息。数组可以用过numericvector输出到r中。此numericvector可以通过.attr(“dim”)设置其维数信息。

// dimension最多设置三个维数
output.attr("dim") = dimension(3,4,2);
// 可以给.attr(“dim”)赋予一个向量，则可以设置超过三个维数
numericvector dim = numericvector::create(2,2,2,2);
output.attr("dim") = dim;

示例：

// 返回一个3*3*2数组
robject func(){
  arma::vec long_vec(18,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  numericvector output = wrap(long_vec2);
  output.attr("dim")=dimension(3,3,2);
  return wrap(output);
}

// 返回一个2*2*2*2数组 
// 注意con_to<>::from()
robject func(){
  arma::vec long_vec(16,arma::fill::randn);
  vector<double> long_vec2 = conv_to<vector<double>>::from(long_vec);
  numericvector output = wrap(long_vec2);
  numericvector dim = numericvector::create(2,2,2,2);
  output.attr("dim")=dim;
  return wrap(output);
}

另外建立一个向量存维数，在r中再通过.attr("dim")设置维数

函数返回一维stl vector

自动转化为r中的向量

vector<double> func(numericvector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return vec;
}
numericvector func(numericvector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}
robject func(numericvector x){
  vector<double> vec;
  vec = as<vector<double>>(x);
  return wrap(vec);
}

函数返回二维stl vector

自动转化为r中的list，list中的每个元素是一个vector。

vector<vector<double>> func(numericvector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double>>(x));
  }
  return mat;
}
robject func(numericvector x) {
  vector<vector<double>> mat;
  for (int i=0;i!=3;++i){
    mat.push_back(as<vector<double> >(x));
  }
  return wrap(mat);
}

返回armadillo matrix, cube 或 field

自动转化为r中的matrix

numericmatrix func(){
  arma::mat a(3,4,arma::fill::randu);
  return wrap(a);
}
arma::mat func(){
  arma::mat a(3,4,arma::fill::randu);
  return a;
}

自动转化为r中的三维array

arma::cube func(){
  arma::cube a(3,4,5,arma::fill::randu);
  return a;
}
robject func(){
  arma::cube a(3,4,5,arma::fill::randu);
  return wrap(a);
}

自动转化为r list，每个元素存储一个r向量，但此向量有维数信息（通过.internal(inspect())查询）。

robject func() {
  arma::cube a(3,4,2,arma::fill::randu);
  arma::cube b(3,4,2,arma::fill::randu);
  arma::field <arma::cube> f(2,1);
  f(0)=a;
  f(1)=b;
  return wrap(f);
}

参考文献：

eddelbuettel, d. (2013). seamless r and c++ integration with rcpp. springer publishing company, incorporated. ·

allaire, j.j. (2016). rcpp attributes.

eddelbuettel, d. (2016). rcpp syntactic sugar.

http://adv-r.had.co.nz/rcpp.html

http://www.rcpp.org/

http://blog.csdn.net/a358463121

http://www.runoob.com/cplusplus/cpp-operators.html

如需引用，请注明出处。

以上就是r语言学习rcpp知识全面整理的详细内容，更多关于rcpp知识全面整理的资料请关注其它相关文章！

R语言学习Rcpp基础知识全面整理

目录

1. 相关配置和说明

2. 常用数据类型

3. 常用数据类型的建立

4. 常用数据类型元素访问

5. 成员函数

6. 语法糖

6.1 算术和逻辑运算符

6.2. 常用函数

7. stl

7.1. 迭代器

7.2. 算法

7.3. 数据结构

7.3.1. vectors

7.3.2. sets

7.3.3. maps

8. 与r环境的互动

9. 用rcpp创建r包

10. 输入和输出示例

如何传递数组

通过.attr("dim")设置维数

函数返回一维stl vector

函数返回二维stl vector

返回armadillo matrix, cube 或 field

参考文献：

R语言学习Rcpp基础知识全面整理

R语言学习ggplot2绘制统计图形包全面详解

R语言学习Rcpp基础知识全面整理

R语言学习ggplot2绘制统计图形包全面详解