R 数据分析学习笔记
程序员文章站
2022-07-14 21:41:42
...
模拟成绩
用runif 和rnorm
rnorm(n, mean = 0, sd = 1)
n 为产生随机值个数(长度),mean 是平均数, sd 是标准差 。
使用该函数的时候后,一般要赋予它 3个值.
=seq(10378001,10378100)
> num
[1] 10378001 10378002 10378003 10378004 10378005 10378006
[7] 10378007 10378008 10378009 10378010 10378011 10378012
[13] 10378013 10378014 10378015 10378016 10378017 10378018
[19] 10378019 10378020 10378021 10378022 10378023 10378024
[25] 10378025 10378026 10378027 10378028 10378029 10378030
[31] 10378031 10378032 10378033 10378034 10378035 10378036
[37] 10378037 10378038 10378039 10378040 10378041 10378042
[43] 10378043 10378044 10378045 10378046 10378047 10378048
[49] 10378049 10378050 10378051 10378052 10378
> x1=round(runif(100,min=80,max=100))
> x1
[1] 95 83 95 87 97 89 87 85 87 91 94 82 98
[14] 93 97 84 83 94 100 96 88 94 98 96 83 88
[27] 91 84 91 86 98 95 86 99 97 86 91 96 100
[40] 88 99 95 87 93 95 81 86 84 82 88 89 96
[53] 81 88 97 82 86 87 93 89 80 86 88 86 96
[66] 97 100 90 100 82 88 100 82 87 93 85 91 81
[79] 87 91 93 96 96 98 98 92 99 92 92 86 91
[92] 92 80 97 91 86 97 81 83 96
> x2=round(rnorm(100,mean=80,sd=7))
> x2
[1] 77 79 72 74 75 80 92 83 82 77 74 78 87
[14] 83 80 87 88 74 75 70 81 75 81 74 83 100
[27] 78 78 75 78 74 78 72 86 81 74 70 79 80
[40] 66 89 72 85 88 79 86 84 77 86 81 87 73
[53] 86 88 86 84 78 88 83 79 83 66 67 79 82
[66] 83 92 76 84 78 83 77 80 79 73 85 88 83
[79] 81 74 72 77 69 85 85 82 67 76 87 78 75
[92] 80 88 68 96 76 89 72 78 73
>
> x3=round(rnorm(100,mean=80,sd=18))
> x3
[1] 49 74 83 97 70 72 80 74 82 94 76 129 122
[14] 65 83 72 78 91 71 43 106 62 93 71 109 76
[27] 102 107 75 101 87 86 76 98 86 73 79 88 56
[40] 77 67 102 82 96 95 63 82 112 80 52 27 84
[53] 91 73 44 70 70 60 91 86 112 101 71 102 94
[66] 84 41 90 79 86 96 81 70 85 87 62 87 49
[79] 113 65 113 75 25 53 54 88 64 96 73 53 94
[92] 97 83 55 38 110 92 85 91 80
> x3[which(x3>100)]=100
> x3
[1] 49 74 83 97 70 72 80 74 82 94 76 100 100
[14] 65 83 72 78 91 71 43 100 62 93 71 100 76
[27] 100 100 75 100 87 86 76 98 86 73 79 88 56
[40] 77 67 100 82 96 95 63 82 100 80 52 27 84
[53] 91 73 44 70 70 60 91 86 100 100 71 100 94
[66] 84 41 90 79 86 96 81 70 85 87 62 87 49
[79] 100 65 100 75 25 53 54 88 64 96 73 53 94
[92] 97 83 55 38 100 92 85 91 80
> mean(x)
[1] NA
警告信息:
In mean.default(x) : 参数不是数值也不是逻辑值:回覆NA
> colMeans(x)
num x1 x2 x3
10378050.50 90.65 79.98 83.55
x1.1 x2.1 x3.1
90.56 79.75 78.73
> colMeans(x)[c("x1","x2","x3")]
x1 x2 x3
90.65 79.98 83.55
> apply(x,2,mean)
num x1 x2 x3
10378050.50 90.65 79.98 83.55
x1.1 x2.1 x3.1
90.56 79.75 78.73
>
> num=seq(10378001,10378100)
> num
[1] 10378001 10378002 10378003 10378004 10378005 10378006
[7] 10378007 10378008 10378009 10378010 10378011 10378012
[13] 10378013 10378014 10378015 10378016 10378017 10378018
[19] 10378019 10378020 10378021 10378022 10378023 10378024
[25] 10378025 10378026 10378027 10378028 10378029 10378030
[31] 10378031 10378032 10378033 10378034 10378035 10378036
[37] 10378037 10378038 10378039 103
>
> x=data.frame(x,x1,x2,x3)
> x
num x1 x2 x3 x1.1 x2.1 x3.1
1 10378001 98 85 117 95 77 49
2 10378002 86 84 70 83 79 74
3 10378003 92 77 105 95 72 83
4 10378004 92 78 118 87 74 97
5 10378005 89 80 64 97 75 70
6 10378006 82 92 91 89 80 72
7 10378007 91 92 77 87 92 80
8 10378008 92 93 69 85 83 74
> mean(x)
[1] NA
警告信息:
In mean.default(x) : 参数不是数值也不是逻辑值:回覆NA
> colMeans(x)
num x1 x2 x3
10378050.50 90.65 79.98 83.55
x1.1 x2.1 x3.1
90.56 79.75 78.73
> colMeans(x)[c("x1","x2","x3")]
x1 x2 x3
90.65 79.98 83.55
> apply(x,2,mean)
num x1 x2 x3
10378050.50 90.65 79.98 83.55
x1.1 x2.1 x3.1
90.56 79.75 78.73
>
>
> apply(x,2,max)
num x1 x2 x3 x1.1 x2.1
10378100 100 97 132 100 100
x3.1
100
> apply(x,2,min)
num x1 x2 x3 x1.1 x2.1
10378001 80 65 36 80 66
x3.1
25
> apply(x,2,sum)
num x1 x2 x3 x1.1
1037805050 9065 7998 8355 9056
x2.1 x3.1
7975 7873
> apply(x,2,sum)[c("x1","x2","x3")]
x1 x2 x3
9065 7998 8355
>
> apply(x[c("x1","x2","x3")],1,sum)
[1] 300 240 274 288 233 265 260 254 228 262 239 206 247
[14] 234 260 266 254 278 245 282 282 259 274 277 231 216
[27] 211 261 301 278 275 247 248 287 249 245 281 265 250
[40] 234 275 259 232 281 252 257 226 249 288 264 247 262
[53] 255 254 265 229 272 226 223 246 299 217 259 250 240
[66] 251 279 236 263 273 277 255 287 248 244 249 228 223
[79] 284 273 278 252 268 229 257 233 189 253 233 246 253
[92] 230 244 252 258 207 238 269 256 290
>
> which.max(apply(x[c("x1","x2","x3")],1,sum))
[1] 29
> x$num[which.max(apply(x[c("x1","x2","x3")],1,sum))]
[1] 10378029
> x$num[which.max(apply(x[c("x1","x2","x3")],1,sum))]
[1] 10378029
> hist(x$x1)
> plot(x1,x2)
> plot(x$x1,x$x2)
列联表分析
列联函数table(),柱状图绘制函数barplot()
table(x$x1)
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
3 6 4 4 6 3 6 5 3 4 4 4 8 4 3 2 5 4 5
99 100
13 4
> barplot(table(x$x1))
* 饼图 饼图绘制函数pie() *
> pie(table(x$x1))
* 箱尾图*
- 箱子的上下横线为样本的25%和75%分位数
- 箱子中间的横线为样本的中位数
3.上下延伸的直线称为尾线,尾线的尽头为最高值和最低值 - 异常值
boxplot(xx2,x$x3)
* 箱线图*
>boxplot(x[2:4],col=c("red","green","blue"),notch=T)
星像图
- 每个观测单位的数值表示为一个图形
- 每个图的每个角表示一个变量,字符串类型会标注在图的下方
- 角线的长度表示表达值的大小
> stars(x[c("x1","x2","x3")])
> stars(x[c("x1","x2","x3")],full=T,draw.segments=T)
> stars(x[c("x1","x2","x3")],full=F,draw.segments=T)
>
脸谱图
- 用五官的宽度和高度来描述数值
- 人对脸谱高度敏感和强记忆
- 适合较少样本的情况
QQ图
- 可用于判断是否是正态分布
- 直线的斜率是标准差,截距是均值
- 点的散布越接近直线,则越接近正态分布
> qqnorm(x1)
> qqline(x1)
> qqnorm(x3)