《R语言实战》学习记录:R语言介绍及创建数据集
时间:2018-08-12
教程:《R语言实战》
学习内容:第一章、第二章
R语言实战
第一章:R语言介绍
> getwd() # 显示当前工作目录
[1] "D:/Documents"
> setwd("E:\\Learning_R\\practice")
Error in setwd("E:\\Learning_R\\practice") :
cannot change working directory
> dir.create("E:\\Learning_R\\practice") # 创建新的目录
> setwd("E:\\Learning_R\\practice") # 更改当前工作目录
> getwd()
[1] "E:/Learning_R/practice"
> age <- c(1,3,5,2,11,9,3,9,12,3)
> weight <- c(4.4, 5.3, 7.2, 5.2, 8.5, 7.3, 6.0, 10.4, 10.2, 6.1)
> ls() 显示工作空间中的对象
[1] "age" "weight"
> rm(age) # 删除对象age
> history()
> savehistory("s") # 将命令历史保存到文件s.Rhistory
清除命令历史:
> loadhistory("s") # 载入命令历史文件s.Rhistory
> rm(list = ls()) # 清除工作空间中所有对象
> load("ss")
> age <- c(1,3,5,2,11,9,3,9,12,3)
> save(age, file = "sss") # 将对象age保存到文件sss.RData中
> rm(list = ls())
> load("sss")
写一个脚本文件,名为1.R,并将其保存在工作目录中:
> source("1.R") # 执行1.R中的命名
R自带了一系列默认包:base、datasets、utils、grDevices、graphics、stats、methods。
> library() # 显示库中的所有包
> search() # 显示已加载可以直接使用的包
[1] ".GlobalEnv" "tools:rstudio"
[3] "package:stats" "package:graphics"
[5] "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods"
[9] "Autoloads" "package:base"
对于某函数或某包不了解时,可以使用help函数查询:
> help("RODBC") # 错误示范
No documentation for ‘RODBC’ in specified packages and libraries:
you could try ‘??RODBC’
> help(package = "RODBC") # 查询包的内容
查询包中的函数的帮助需要事先加载该包:
> help("read_xls")
No documentation for ‘read_xls’ in specified packages and libraries:
you could try ‘??read_xls’
> library("readxl")
> help("read_xls")
下面,使用《R语言实战》中的一个例子对上述方法进行练习。
代码:
help.start()
install.packages("vcd")
help(package = "vcd")
library(vcd)
help("Arthritis")
Arthritis
example(Arthritis)
q()
第二章:创建数据集
1.向量
创建向量使用matrix函数,格式如下:
matrix(vector, nrow = number_of_rows, ncol = number_of_columns, byrow = FALSE, dimnames = list(char_vector_rownames, char_vector_colnames))
2.数组
创建数组使用array函数,格式如下:
array(vector, dimensions, dimnames)
eg:
array(1:24, c(2,3,4), dimnames = list(dim1,dim2,dim3))
3.数据框
创建数据框使用data.frame函数,格式如下:
data.frame(col1, col2, col3, …)
eg1:
> num <- 12:14
> names <- c("A", "B", "C")
> age <- c(17, 19, 21)
> major <- c("Math", "English", "Chinese")
> mydata <- data.frame(names, age, major, row.names = num)
> mydata
names age major
12 A 17 Math
13 B 19 English
14 C 21 Chinese
eg2:
> mydata2 <- data.frame(names = c("A", "B", "C"),
+ age = c(17, 19, 21),
+ major = c("Math", "English", "Chinese"),
+ row.names = 12:14)
> mydata2
names age major
12 A 17 Math
13 B 19 English
14 C 21 Chinese
提取数据框中内容:
> mydata[c("age","names")]
age names
1 17 A
2 19 B
3 21 C
> mydata[1:2]
names age
1 A 17
2 B 19
3 C 21
> mydata$age
[1] 17 19 21
> attach(mydata) # 将数据框mydata添加到搜索路径中
The following objects are masked _by_ .GlobalEnv:
age, major, names
> age
[1] 17 19 21
> names
[1] "A" "B" "C"
> detach(mydata) # 将数据框mydata从搜索路径中移除
> rm(list = c("mydata2","age", "names", "major"))
> age <- 12
> attach(mydata)
The following objects are masked _by_ .GlobalEnv:
age
> age
[1] 12
> names
[1] "A" "B" "C"
> detach(mydata)
若在使用attach函数将数据框加入搜索路径前,环境中已经存在一个对象与数据框中元素名相同时,前者具有优先权。出现这种情况时R会发出警告。
with函数:
> with(mydata, {
+ age
+ summary(names)
+ a <- summary(age)
+ a
+ b <<- summary(names)
+ })
> a
Error: object 'a' not found
> b
A B C
1 1 1
4.因子:
使用fator函数创建因子。
eg:
# study
> data <- c("Poor", "Improved", "Excellent", "Poor")
> status1 <- factor(data)
> status1
[1] Poor Improved Excellent Poor
Levels: Excellent Improved Poor
> status2 <- factor(data, order = TRUE)
> status2
[1] Poor Improved Excellent Poor
Levels: Excellent < Improved < Poor
> status3 <- factor(data, order = TRUE,
+ levels = c("Poor", "Improved", "Excellent"))
> status3
[1] Poor Improved Excellent Poor
Levels: Poor < Improved < Excellent
> status4 <- factor(c(3,2,1,3), order = TRUE,
+ levels = c(3,2,1),
+ labels = c("Poor", "Improved", "Excellent"))
> status4
[1] Poor Improved Excellent Poor
Levels: Poor < Improved < Excellent
# practice
> num <- 12:14
> names <- c("A", "B", "C")
> age <- c(17, 19, 21)
> major <- factor(c("Math", "English", "Math"))
> grade <- factor(c("First", "Second", "Third"), order = TRUE,
+ level = c("First", "Second", "Third"))
> mydata <- data.frame(names, age, major, grade, row.names = num)
> str(mydata)
'data.frame': 3 obs. of 4 variables:
$ names: Factor w/ 3 levels "A","B","C": 1 2 3
$ age : num 17 19 21
$ major: Factor w/ 2 levels "English","Math": 2 1 2
$ grade: Ord.factor w/ 3 levels "First"<"Second"<..: 1 2 3
> summary(mydata)
names age major grade
A:1 Min. :17 English:1 First :1
B:1 1st Qu.:18 Math :2 Second:1
C:1 Median :19 Third :1
Mean :19
3rd Qu.:20
Max. :21
5.列表:
使用list函数创建列表。
eg:
> a <- "My"
> b <- matrix(c(1,5,2,7), nrow = 2, dimnames = list(c("A","B")))
> c <- matrix(1:4, nrow = 2, dimnames = list(c("A","B"), c("C","D")))
> d <- factor(c("one", "two", "two"), order = TRUE,
+ levels = c("one", "two", "three"))
> e <- data.frame(w = c("I", "am", "XXX"), m = c(1,2,3))
> mylist <- list(title = a, numb = b, c, d, e)
> mylist
$`title`
[1] "My"
$numb
[,1] [,2]
A 1 2
B 5 7
[[3]]
C D
A 1 3
B 2 4
[[4]]
[1] one two two
Levels: one < two < three
[[5]]
w m
1 I 1
2 am 2
3 XXX 3
提取列表中数据:
> mylist[4]
[[1]]
[1] one two two
Levels: one < two < three
> mylist[[4]]
[1] one two two
Levels: one < two < three
> mylist["numb"]
$`numb`
[,1] [,2]
A 1 2
B 5 7
> mylist[["numb"]]
[,1] [,2]
A 1 2
B 5 7
> class(mylist[4])
[1] "list"
> class(mylist[[4]])
[1] "ordered" "factor"
> mylist[4][2]
$<NA>
NULL
> mylist[[4]][c(1,3)]
[1] one two
Levels: one < two < three
列表名[n]:将列表的第n个元素作为列表的一部分截取出来,返回对象类型为列表。
列表名[[n]]:返回列表的第n个元素的内容,返回对象类型为第n个元素本身的类型。
代码”mylist[4][2]”中”mylist[4]”为列表mylist的一部分,这一部分是一个整体,所以该部分的第二个元素为缺失值。
代码”mylist[[4]][c(1,3)]”中”mylist[[4]]”为元素的内容,是一个含有三个元素的因子,因此,该因子的第1和第3个元素不是缺失值。
6.输入数据
键盘输入:
eg:
> mydata <- data.frame(age = numeric(0), gender = character(0))
> mydata <- edit(mydata)
弹出窗口:
> mydata <- data.frame(age = numeric(0), gender = character(0))
> fix(mydata1)
numeric(0)表示生成一个类型为数值型的空变量。
使用edit函数必须将其赋值回本身,因为该函数生成原对象的副本,并在副本上进行数据输入。如果不对其结果进行赋值,则关闭数据编辑器后数据将会丢失。
使用fix函数可以直接编辑对象内容。
键盘输入的方法适用于小数据集,如果需要将别的文件导入R需要用相应的包和函数进行。(具体内容见《R语言实战》P32-P37)