欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

data.table简介

程序员文章站 2022-05-18 17:17:40
...

data.table语法介绍

因为这篇文章主要是data.table,所以在详细对比之前,先来介绍一下dplyr的情况

dplyr的优点在于语法优雅,符合人的逻辑,简单易懂;而data.table则在于语法简介,运行速度快,对于大数据来说非常强大,但是语法有时候也不太容易理解



dplyr包经常用的函数

  • select(),选择列
  • filter(),筛选行
  • mutate(),增加新列,类似于transform
  • group_by,分组
  • summarise(),汇总数据


data.table

data.table的通用格式为DT[i,j,by],i代表行,j代表列,by代表分组依据

这里的话我们选用iris数据集来进行说明

> DT <- data.table(iris)
> set.seed(45L)
> DT[,c("V1","V2"):=list(LETTERS[1:3],c(1L,2L))]
> names(DT) <- tolower(names(DT))
> head(DT)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

1.通过i来筛选行

  • 通过行数

选取3到5行的数据

> DT[3:5,] #or DT[3:5]
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          4.7         3.2          1.3         0.2  setosa  C  1
2:          4.6         3.1          1.5         0.2  setosa  A  2
3:          5.0         3.6          1.4         0.2  setosa  B  1
  • 通过特定条件
    这里是用"=="这种方式,这种方式虽然简单易懂,但是会遍历整个数组,速度会有点慢,所以建议设置键,后面会有讲到
> head(DT[species=='setosa'])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> tail(DT[species=='setosa'])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.8          1.9         0.4  setosa  C  1
2:          4.8         3.0          1.4         0.3  setosa  A  2
3:          5.1         3.8          1.6         0.2  setosa  B  1
4:          4.6         3.2          1.4         0.2  setosa  C  2
5:          5.3         3.7          1.5         0.2  setosa  A  1
6:          5.0         3.3          1.4         0.2  setosa  B  2
> head(DT[species %in% c("setosa","versicolor")]) #这两代表或的意思
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> tail(DT[species %in% c("setosa","versicolor")])
   sepal.length sepal.width petal.length petal.width    species v1 v2
1:          5.6         2.7          4.2         1.3 versicolor  B  1
2:          5.7         3.0          4.2         1.2 versicolor  C  2
3:          5.7         2.9          4.2         1.3 versicolor  A  1
4:          6.2         2.9          4.3         1.3 versicolor  B  2
5:          5.1         2.5          3.0         1.1 versicolor  C  1
6:          5.7         2.8          4.1         1.3 versicolor  A  2
> head(DT[sepal.length %between% c(4.5,5)])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          4.9         3.0          1.4         0.2  setosa  B  2
2:          4.7         3.2          1.3         0.2  setosa  C  1
3:          4.6         3.1          1.5         0.2  setosa  A  2
4:          5.0         3.6          1.4         0.2  setosa  B  1
5:          4.6         3.4          1.4         0.3  setosa  A  1
6:          5.0         3.4          1.5         0.2  setosa  B  2
> tail(DT[sepal.length %between% c(4.5,5)])
   sepal.length sepal.width petal.length petal.width    species v1 v2
1:          4.6         3.2          1.4         0.2     setosa  C  2
2:          5.0         3.3          1.4         0.2     setosa  B  2
3:          4.9         2.4          3.3         1.0 versicolor  A  2
4:          5.0         2.0          3.5         1.0 versicolor  A  1
5:          5.0         2.3          3.3         1.0 versicolor  A  2
6:          4.9         2.5          4.5         1.7  virginica  B  1

2.通过j来对列进行操作

2.1 选取列

  • 选取一列

.()相当于list()

> head(DT[,sepal.width]) #以向量形式展现
[1] 3.5 3.0 3.2 3.1 3.6 3.9
> head(DT[,.(sepal.width)]) #数据框的形式展现
   sepal.width
1:         3.5
2:         3.0
3:         3.2
4:         3.1
5:         3.6
6:         3.9
  • 选取多列
> head(DT[,.(sepal.width,sepal.length)])
   sepal.width sepal.length
1:         3.5          5.1
2:         3.0          4.9
3:         3.2          4.7
4:         3.1          4.6
5:         3.6          5.0
6:         3.9          5.4
  • 用列数来选取行
> head(DT[,1,with=FALSE]) #选取第一列
   sepal.length
1:          5.1
2:          4.9
3:          4.7
4:          4.6
5:          5.0
6:          5.4
> head(DT[,2,with=FALSE]) #选取第二列
   sepal.width
1:         3.5
2:         3.0
3:         3.2
4:         3.1
5:         3.6
6:         3.9
> head(DT[,3,with=FALSE]) #选取第三列
   petal.length
1:          1.4
2:          1.4
3:          1.3
4:          1.5
5:          1.4
6:          1.7

2.2 在j上使用函数

> DT[,sum(sepal.width)]
[1] 458.6
> DT[,.(sum(sepal.width))]
      V1
1: 458.6
> DT[,.(SUM=sum(sepal.width))] #可以重命名
     SUM
1: 458.6
  • 选取列和使用函数可以一起用
    如果列的长度不一,则会循环对齐
> head(DT[,.(sepal.width,sd=sd(sepal.width))])
   sepal.width        sd
1:         3.5 0.4358663
2:         3.0 0.4358663
3:         3.2 0.4358663
4:         3.1 0.4358663
5:         3.6 0.4358663
6:         3.9 0.4358663
  • 多个表达式可以包含在大括号中
> DT[,{print(head(sepal.width))
+   plot(sepal.width)
+   NULL}]
[1] 3.5 3.0 3.2 3.1 3.6 3.9
#这里应该是一副散点图,在代码块不好展示图(主要是懒)
NULL

3.根据分组来操作j

  • 对species中的每一类来计算sepal.length的和
> DT[,.(SUM=sum(sepal.length),by=species)]
       SUM        by
  1: 876.5    setosa
  2: 876.5    setosa
  3: 876.5    setosa
  4: 876.5    setosa
  5: 876.5    setosa
 ---                
146: 876.5 virginica
147: 876.5 virginica
148: 876.5 virginica
149: 876.5 virginica
150: 876.5 virginica

#注意by加.()和没加.()的区别
> DT[,.(SUM=sum(sepal.length)),by=.(species)]
      species   SUM
1:     setosa 250.3
2: versicolor 296.8
3:  virginica 329.4
  • 对多列进行分组
> DT[,.(SUM=sum(sepal.width)),by=.(species,v1)]
      species v1  SUM
1:     setosa  A 59.0
2:     setosa  B 58.6
3:     setosa  C 53.8
4: versicolor  C 46.5
5: versicolor  A 45.5
6: versicolor  B 46.5
7:  virginica  B 51.4
8:  virginica  C 49.6
9:  virginica  A 47.7
  • 在by中使用函数
> DT[,.(SUM=sum(sepal.length)),by=sign(v2-1)]
   sign   SUM
1:    0 438.0
2:    1 438.5
  • 指定i行子集进行分组汇总
> DT[1:40,.(SUM=sum(sepal.length)),by=species]
   species   SUM
1:  setosa 201.5
  • 使用.N来计算每个分组的个数
> DT[,.(count=.N),by=species]
      species count
1:     setosa    50
2: versicolor    50
3:  virginica    50

4.使用:=来增加,更改,减少列

注意:用了:=这种方法,会直接在原数据集上进行更改,所以DT <- DT[,:=]是不需要的,直接DT[,:=]就可以了

  • 更新一列
> dt <- copy(DT)
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> head(dt[,v1:=round(exp(v2),2)])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  3  1
2:          4.9         3.0          1.4         0.2  setosa  7  2
3:          4.7         3.2          1.3         0.2  setosa  3  1
4:          4.6         3.1          1.5         0.2  setosa  7  2
5:          5.0         3.6          1.4         0.2  setosa  3  1
6:          5.4         3.9          1.7         0.4  setosa  7  2
  • 增加多列
> dt[,c("h1","h2"):=.(round(exp(v2)),LETTERS[4:6])]
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2 h1 h2
1:          5.1         3.5          1.4         0.2  setosa  3  1  3  D
2:          4.9         3.0          1.4         0.2  setosa  7  2  7  E
3:          4.7         3.2          1.3         0.2  setosa  3  1  3  F
4:          4.6         3.1          1.5         0.2  setosa  7  2  7  D
5:          5.0         3.6          1.4         0.2  setosa  3  1  3  E
6:          5.4         3.9          1.7         0.4  setosa  7  2  7  F

# 上面可以可以写成,因为展示方便,修改是只选取了第5至第9列数据
> head(dt[,':='(h1=round(exp(v2)),h2=LETTERS[4:6])][,5:9])
   species v1 v2 h1 h2
1:  setosa  A  1  3  D
2:  setosa  B  2  7  E
3:  setosa  C  1  3  F
4:  setosa  A  2  7  D
5:  setosa  B  1  3  E
6:  setosa  C  2  7  F
  • 删除列
> dt[,':='(h1=NULL,h2=NULL)]
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

也可以写成下面这种
----------
> head(dt[,c("h1","h2"):=NULL])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

  • 修改特定条件下的值
> dt[sepal.length>4&v1=='A',v2:=3]

> head(dt[,.(v2)])
   v2
1:  3
2:  2
3:  1
4:  3
5:  1
6:  2

5.设置索引列并进行操作

  • 在创建数据框时就直接设定索引列
data <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6),key="a")
> head(data)
   a          b
1: A  0.3407997
2: A -0.7460474
3: A -0.8981073
4: B -0.7033403
5: B -0.3347941
6: C -0.3795377
  • 有数据框之后再设定
> dt <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6))
> dt
   a          b
1: A -0.5013782
2: B -0.1745357
3: C  1.8090374
4: A -0.2301050
5: A -1.1304182
6: B  0.2159889

#仔细对比两个dt的值

> setkey(dt,a) #会自动对键值列进行排序
> dt
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B  0.2159889
6: C  1.8090374
  • 查看数据框时候有key
> key(dt)
[1] "a"
> haskey(dt)
[1] TRUE
> attributes(dt)
$names
[1] "a" "b"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: 0x10180cf78>

$sorted
[1] "a"

> attributes(dt)$sorted
[1] "a"
  • 设置a列为索引列后取a列中值为B的行
> dt['B']
   a          b
1: B -0.1745357
2: B  0.2159889
  • 设置索引之后取a列中值为B的第一行
> dt['B',mult='first'] #mult参数默认为"all"
   a          b
1: B -0.1745357
  • 设置索引之后取a列中值为B的最后一行
> dt['B',mult='last']
   a         b
1: B 0.2159889
  • 设置a列为索引列后取a列中值为A或B的行
> dt[c('A','B')]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B  0.2159889
  • nomatch参数用于给定在没有匹配到值得时候该给予什么值,默认为NA,也可以设置为0,0代表对于没有匹配到的行将不会返回
> dt[c('A','D')]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: D         NA
----------
> dt[c('A','D'),nomatch=0]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182

  • by=.EACHI参数允许按每一个已知i的子集分组,使用前必须先设置键值列
> dt[c('A','B'),sum(b)]
[1] -1.820448

    ----------

> dt[c('A','B'),sum(b),by=.EACHI]
   a          V1
1: A -1.86190135
2: B  0.04145319
  • 设置多个键值列
> head(DT)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> setkey(DT,v1,v2) #会先按v1排序,在按v2排序
> head(DT[.('B',1)]) #筛选出v1列值为B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.0         3.6          1.4         0.2  setosa  B  1
2:          5.4         3.7          1.5         0.2  setosa  B  1
3:          5.4         3.9          1.3         0.4  setosa  B  1
4:          4.6         3.6          1.0         0.2  setosa  B  1
5:          5.2         3.4          1.4         0.2  setosa  B  1
6:          4.9         3.1          1.5         0.2  setosa  B  1


> head(DT[.(c('A','B'),1)]) #筛选出v1列值为A或者B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.6         3.4          1.4         0.3  setosa  A  1
3:          4.8         3.0          1.4         0.1  setosa  A  1
4:          5.7         3.8          1.7         0.3  setosa  A  1
5:          4.8         3.4          1.9         0.2  setosa  A  1
6:          4.8         3.1          1.6         0.2  setosa  A  1
> tail(DT[.(c('A','B'),1)]) #筛选出v1列值为A或者B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width   species v1 v2
1:          7.7         2.6          6.9         2.3 virginica  B  1
2:          6.7         3.3          5.7         2.1 virginica  B  1
3:          7.4         2.8          6.1         1.9 virginica  B  1
4:          6.3         3.4          5.6         2.4 virginica  B  1
5:          5.8         2.7          5.1         1.9 virginica  B  1
6:          6.2         3.4          5.4         2.3 virginica  B  1

6 data.table高级操作

  • 使用.N来表示行的数量
> DT[.N] #在i处使用可以返回最后一行
   sepal.length sepal.width petal.length petal.width   species v1 v2
1:          5.9           3          5.1         1.8 virginica  C  2
> DT[,.N] #在j处使用可以返回最后一行的行数
[1] 150
  • .SD
    .SD是一个data.table,他包含了各个分组的数据,除了by中的变量的所有元素,且只能在j中使用
> DT[,print(.SD),by=v1]
    sepal.length sepal.width petal.length petal.width    species v2
 1:          5.1         3.5          1.4         0.2     setosa  1
 2:          4.6         3.4          1.4         0.3     setosa  1
 3:          4.8         3.0          1.4         0.1     setosa  1
 4:          5.7         3.8          1.7         0.3     setosa  1
 5:          4.8         3.4          1.9         0.2     setosa  1
 6:          4.8         3.1          1.6         0.2     setosa  1
 7:          5.5         3.5          1.3         0.2     setosa  1
 8:          4.4         3.2          1.3         0.2     setosa  1
 9:          5.3         3.7          1.5         0.2     setosa  1
10:          6.5         2.8          4.6         1.5 versicolor  1
11:          5.0         2.0          3.5         1.0 versicolor  1
12:          5.6         3.0          4.5         1.5 versicolor  1
13:          6.3         2.5          4.9         1.5 versicolor  1
14:          6.0         2.9          4.5         1.5 versicolor  1
15:          5.4         3.0          4.5         1.5 versicolor  1
16:          5.5         2.6          4.4         1.2 versicolor  1
17:          5.7         2.9          4.2         1.3 versicolor  1
18:          7.1         3.0          5.9         2.1  virginica  1
19:          6.7         2.5          5.8         1.8  virginica  1
20:          5.8         2.8          5.1         2.4  virginica  1
21:          6.9         3.2          5.7         2.3  virginica  1
22:          6.2         2.8          4.8         1.8  virginica  1
23:          6.4         2.8          5.6         2.2  virginica  1
24:          6.0         3.0          4.8         1.8  virginica  1
25:          6.7         3.3          5.7         2.5  virginica  1
26:          4.6         3.1          1.5         0.2     setosa  2
27:          4.9         3.1          1.5         0.1     setosa  2
28:          5.7         4.4          1.5         0.4     setosa  2
29:          5.1         3.7          1.5         0.4     setosa  2
30:          5.2         3.5          1.5         0.2     setosa  2
31:          5.5         4.2          1.4         0.2     setosa  2
32:          5.1         3.4          1.5         0.2     setosa  2
33:          4.8         3.0          1.4         0.3     setosa  2
34:          6.4         3.2          4.5         1.5 versicolor  2
35:          4.9         2.4          3.3         1.0 versicolor  2
36:          6.1         2.9          4.7         1.4 versicolor  2
37:          5.6         2.5          3.9         1.1 versicolor  2
38:          6.6         3.0          4.4         1.4 versicolor  2
39:          5.5         2.4          3.7         1.0 versicolor  2
40:          6.3         2.3          4.4         1.3 versicolor  2
41:          5.0         2.3          3.3         1.0 versicolor  2
42:          5.7         2.8          4.1         1.3 versicolor  2
43:          7.6         3.0          6.6         2.1  virginica  2
44:          6.4         2.7          5.3         1.9  virginica  2
45:          7.7         3.8          6.7         2.2  virginica  2
46:          6.3         2.7          4.9         1.8  virginica  2
47:          7.2         3.0          5.8         1.6  virginica  2
48:          7.7         3.0          6.1         2.3  virginica  2
49:          6.9         3.1          5.1         2.3  virginica  2
50:          6.5         3.0          5.2         2.0  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
    sepal.length sepal.width petal.length petal.width    species v2
 1:          5.0         3.6          1.4         0.2     setosa  1
 2:          5.4         3.7          1.5         0.2     setosa  1
 3:          5.4         3.9          1.3         0.4     setosa  1
 4:          4.6         3.6          1.0         0.2     setosa  1
 5:          5.2         3.4          1.4         0.2     setosa  1
 6:          4.9         3.1          1.5         0.2     setosa  1
 7:          5.0         3.5          1.3         0.3     setosa  1
 8:          5.1         3.8          1.6         0.2     setosa  1
 9:          6.9         3.1          4.9         1.5 versicolor  1
10:          6.6         2.9          4.6         1.3 versicolor  1
11:          5.6         2.9          3.6         1.3 versicolor  1
12:          5.9         3.2          4.8         1.8 versicolor  1
13:          6.8         2.8          4.8         1.4 versicolor  1
14:          5.8         2.7          3.9         1.2 versicolor  1
15:          5.6         3.0          4.1         1.3 versicolor  1
16:          5.6         2.7          4.2         1.3 versicolor  1
17:          6.3         3.3          6.0         2.5  virginica  1
18:          4.9         2.5          4.5         1.7  virginica  1
19:          6.8         3.0          5.5         2.1  virginica  1
20:          7.7         2.6          6.9         2.3  virginica  1
21:          6.7         3.3          5.7         2.1  virginica  1
22:          7.4         2.8          6.1         1.9  virginica  1
23:          6.3         3.4          5.6         2.4  virginica  1
24:          5.8         2.7          5.1         1.9  virginica  1
25:          6.2         3.4          5.4         2.3  virginica  1
26:          4.9         3.0          1.4         0.2     setosa  2
27:          5.0         3.4          1.5         0.2     setosa  2
28:          4.3         3.0          1.1         0.1     setosa  2
29:          5.1         3.8          1.5         0.3     setosa  2
30:          5.0         3.0          1.6         0.2     setosa  2
31:          5.4         3.4          1.5         0.4     setosa  2
32:          4.9         3.6          1.4         0.1     setosa  2
33:          5.0         3.5          1.6         0.6     setosa  2
34:          5.0         3.3          1.4         0.2     setosa  2
35:          5.7         2.8          4.5         1.3 versicolor  2
36:          5.9         3.0          4.2         1.5 versicolor  2
37:          5.8         2.7          4.1         1.0 versicolor  2
38:          6.1         2.8          4.7         1.2 versicolor  2
39:          5.7         2.6          3.5         1.0 versicolor  2
40:          6.0         3.4          4.5         1.6 versicolor  2
41:          6.1         3.0          4.6         1.4 versicolor  2
42:          6.2         2.9          4.3         1.3 versicolor  2
43:          6.3         2.9          5.6         1.8  virginica  2
44:          7.2         3.6          6.1         2.5  virginica  2
45:          6.4         3.2          5.3         2.3  virginica  2
46:          5.6         2.8          4.9         2.0  virginica  2
47:          6.1         3.0          4.9         1.8  virginica  2
48:          6.3         2.8          5.1         1.5  virginica  2
49:          6.9         3.1          5.4         2.1  virginica  2
50:          6.7         3.0          5.2         2.3  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
    sepal.length sepal.width petal.length petal.width    species v2
 1:          4.7         3.2          1.3         0.2     setosa  1
 2:          4.4         2.9          1.4         0.2     setosa  1
 3:          5.8         4.0          1.2         0.2     setosa  1
 4:          5.4         3.4          1.7         0.2     setosa  1
 5:          5.0         3.4          1.6         0.4     setosa  1
 6:          5.2         4.1          1.5         0.1     setosa  1
 7:          4.4         3.0          1.3         0.2     setosa  1
 8:          5.1         3.8          1.9         0.4     setosa  1
 9:          7.0         3.2          4.7         1.4 versicolor  1
10:          6.3         3.3          4.7         1.6 versicolor  1
11:          6.0         2.2          4.0         1.0 versicolor  1
12:          6.2         2.2          4.5         1.5 versicolor  1
13:          6.4         2.9          4.3         1.3 versicolor  1
14:          5.5         2.4          3.8         1.1 versicolor  1
15:          6.7         3.1          4.7         1.5 versicolor  1
16:          5.8         2.6          4.0         1.2 versicolor  1
17:          5.1         2.5          3.0         1.1 versicolor  1
18:          6.5         3.0          5.8         2.2  virginica  1
19:          6.5         3.2          5.1         2.0  virginica  1
20:          6.5         3.0          5.5         1.8  virginica  1
21:          7.7         2.8          6.7         2.0  virginica  1
22:          6.4         2.8          5.6         2.1  virginica  1
23:          6.1         2.6          5.6         1.4  virginica  1
24:          6.7         3.1          5.6         2.4  virginica  1
25:          6.3         2.5          5.0         1.9  virginica  1
26:          5.4         3.9          1.7         0.4     setosa  2
27:          4.8         3.4          1.6         0.2     setosa  2
28:          5.1         3.5          1.4         0.3     setosa  2
29:          5.1         3.3          1.7         0.5     setosa  2
30:          4.7         3.2          1.6         0.2     setosa  2
31:          5.0         3.2          1.2         0.2     setosa  2
32:          4.5         2.3          1.3         0.3     setosa  2
33:          4.6         3.2          1.4         0.2     setosa  2
34:          5.5         2.3          4.0         1.3 versicolor  2
35:          5.2         2.7          3.9         1.4 versicolor  2
36:          6.7         3.1          4.4         1.4 versicolor  2
37:          6.1         2.8          4.0         1.3 versicolor  2
38:          6.7         3.0          5.0         1.7 versicolor  2
39:          6.0         2.7          5.1         1.6 versicolor  2
40:          5.5         2.5          4.0         1.3 versicolor  2
41:          5.7         3.0          4.2         1.2 versicolor  2
42:          5.8         2.7          5.1         1.9  virginica  2
43:          7.3         2.9          6.3         1.8  virginica  2
44:          5.7         2.5          5.0         2.0  virginica  2
45:          6.0         2.2          5.0         1.5  virginica  2
46:          7.2         3.2          6.0         1.8  virginica  2
47:          7.9         3.8          6.4         2.0  virginica  2
48:          6.4         3.1          5.5         1.8  virginica  2
49:          6.8         3.2          5.9         2.3  virginica  2
50:          5.9         3.0          5.1         1.8  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
Empty data.table (0 rows) of 1 col: v1
> DT[,.SD,by=v1][]
     v1 sepal.length sepal.width petal.length petal.width   species v2
  1:  A          5.1         3.5          1.4         0.2    setosa  1
  2:  A          4.6         3.4          1.4         0.3    setosa  1
  3:  A          4.8         3.0          1.4         0.1    setosa  1
  4:  A          5.7         3.8          1.7         0.3    setosa  1
  5:  A          4.8         3.4          1.9         0.2    setosa  1
 ---                                                                  
146:  C          7.2         3.2          6.0         1.8 virginica  2
147:  C          7.9         3.8          6.4         2.0 virginica  2
148:  C          6.4         3.1          5.5         1.8 virginica  2
149:  C          6.8         3.2          5.9         2.3 virginica  2
150:  C          5.9         3.0          5.1         1.8 virginica  2
  • 返回以v1列为分组的数据的第一行和最后一行的数据
> DT[,.SD[c(1,.N)],by=v1]
   v1 sepal.length sepal.width petal.length petal.width   species v2
1:  A          5.1         3.5          1.4         0.2    setosa  1
2:  A          6.5         3.0          5.2         2.0 virginica  2
3:  B          5.0         3.6          1.4         0.2    setosa  1
4:  B          6.7         3.0          5.2         2.3 virginica  2
5:  C          4.7         3.2          1.3         0.2    setosa  1
6:  C          5.9         3.0          5.1         1.8 virginica  2
  • 返回以v1和species分组的其他数据的汇总数据
> DT[,lapply(.SD,sum),by=c("v1","species")]
   v1    species sepal.length sepal.width petal.length petal.width v2
1:  A     setosa         85.9        59.0         25.3         3.9 25
2:  A versicolor         98.1        45.5         71.4        22.0 26
3:  A  virginica        108.1        47.7         89.1        33.1 24
4:  B     setosa         85.2        58.6         24.0         4.2 26
5:  B versicolor         96.3        46.5         69.3        21.4 24
6:  B  virginica        109.6        51.4         93.3        35.5 25
7:  C     setosa         79.2        53.8         23.8         4.2 24
8:  C versicolor        102.4        46.5         72.3        22.9 25
9:  C  virginica        111.7        49.6         95.2        32.7 26
  • .SDcols
    常与.SD一起用,用于对.SD取某些列
> DT[,.SD,by=v1,.SDcols=c("species","sepal.length")]
     v1   species sepal.length
  1:  A    setosa          5.1
  2:  A    setosa          4.6
  3:  A    setosa          4.8
  4:  A    setosa          5.7
  5:  A    setosa          4.8
 ---                          
146:  C virginica          7.2
147:  C virginica          7.9
148:  C virginica          6.4
149:  C virginica          6.8
150:  C virginica          5.9
> DT[,.(species,sepal.length),by=v1] #相当于这句
     v1   species sepal.length
  1:  A    setosa          5.1
  2:  A    setosa          4.6
  3:  A    setosa          4.8
  4:  A    setosa          5.7
  5:  A    setosa          4.8
 ---                          
146:  C virginica          7.2
147:  C virginica          7.9
148:  C virginica          6.4
149:  C virginica          6.8
150:  C virginica          5.9

#也可以是一个函数的返回值:
> DT[,lapply(.SD,sum),by=v1,.SDcols=paste0("v",2)]
   v1 v2
1:  A 75
2:  B 75
3:  C 75
  • 串联操作,有点管道(%>%)操作的味道
不串联的情况
> DT2 <- copy(DT)
> DT2 <- DT2[,.(SUM=sum(sepal.length)),by=v1]
> DT2[SUM>291.5]
   v1   SUM
1:  A 292.1
2:  C 293.3
> ##串联操作
> DT2 <- copy(DT)
> DT2[,.(SUM=sum(sepal.length)),by=v1][SUM>291.5] #分组的情况下有点像SQL中的having
   v1   SUM
1:  A 292.1
2:  C 293.3

7.data.table中的melt和dcast

用法和reshape2包差不多,可以参考
利用reshape2包进行数据逆透视和数据透视

上一篇: Iana你是肿么啦

下一篇: Ioc