R 描述性统计分析
程序员文章站
2024-03-23 16:55:04
...
- 中位数:
quantile(iris$Sepal.Length)
0% 25% 50% 75% 100%
4.3 5.1 5.8 6.4 7.9
quantile(iris$Sepal.Length,seq(0,1,by=0.1))
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
4.30 4.80 5.00 5.27 5.60 5.80 6.10 6.30 6.52 6.90 7.90
- 分布形态-----偏态 >0,偏右。
> library(fBasics)
Loading required package: timeDate
Loading required package: timeSeries
> skewness(iris$Sepal.Length)
[1] 0.3086407
attr(,"method")
[1] "moment"
- 分类统计函数:Hmisc :: summary() 默认:fun=mean
mystats <- function(x) c(Median=median(x,na.rm=T),IQR=IQR(x,na.rm=T))
summary(mpg~cyl+hp,data=mtcars,fun=mystats,method='response')
mpg N= 32
+-------+---------+--+------+-----+
| | |N |Median|IQR |
+-------+---------+--+------+-----+
|cyl |4 |11|26.00 |7.600|
| |6 | 7|19.70 |2.350|
| |8 |14|15.20 |1.850|
+-------+---------+--+------+-----+
|hp |[ 52, 97)| 8|26.65 |6.900|
| |[ 97,150)| 9|21.00 |2.200|
| |[150,205)| 8|16.85 |3.400|
| |[205,335]| 7|14.30 |3.000|
+-------+---------+--+------+-----+
|Overall| |32|19.20 |7.375|
+-------+---------+--+------+-----+
summary(mpg~cyl+hp,data=mtcars,fun=quantile,method='response')
mpg N= 32
+-------+---------+--+----+------+-----+------+----+
| | |N |0% |25% |50% |75% |100%|
+-------+---------+--+----+------+-----+------+----+
|cyl |4 |11|21.4|22.800|26.00|30.400|33.9|
| |6 | 7|17.8|18.650|19.70|21.000|21.4|
| |8 |14|10.4|14.400|15.20|16.250|19.2|
+-------+---------+--+----+------+-----+------+----+
|hp |[ 52, 97)| 8|22.8|24.000|26.65|30.900|33.9|
| |[ 97,150)| 9|17.8|19.200|21.00|21.400|30.4|
| |[150,205)| 8|15.2|15.425|16.85|18.825|19.7|
| |[205,335]| 7|10.4|11.850|14.30|14.850|15.8|
+-------+---------+--+----+------+-----+------+----+
|Overall| |32|10.4|15.425|19.20|22.800|33.9|
+-------+---------+--+----+------+-----+------+----+
>
summary(cyl~mpg+hp,data=mtcars,method='reverse')
Descriptive Statistics by cyl
+---+--------------------+--------------------+--------------------+
| |4 |6 |8 |
| |(N=11) |(N=7) |(N=14) |
+---+--------------------+--------------------+--------------------+
|mpg| 22.80/26.00/30.40 | 18.65/19.70/21.00 | 14.40/15.20/16.25 |
+---+--------------------+--------------------+--------------------+
|hp | 65.50/ 91.00/ 96.00|110.00/110.00/123.00|176.25/192.50/241.25|
+---+--------------------+--------------------+--------------------+
>
summary(mpg~cyl+hp,data=mtcars,method='cross',fun=var)
var by cyl, hp
+---+
|N |
|mpg|
+---+
+---+---------+---------+---------+---------+---------+
|cyl|[ 52, 97)|[ 97,150)|[150,205)|[205,335]| ALL |
+---+---------+---------+---------+---------+---------+
|4 | 8 | 3 | 0 | 0 |11 |
| |18.494286|26.703333| | |20.338545|
+---+---------+---------+---------+---------+---------+
|6 | 0 | 6 | 1 | 0 | 7 |
| | | 2.535000| | | 2.112857|
+---+---------+---------+---------+---------+---------+
|8 | 0 | 0 | 7 | 7 |14 |
| | | | 2.764762| 4.804762| 6.553846|
+---+---------+---------+---------+---------+---------+
|ALL| 8 | 9 | 8 | 7 |32 |
| |18.494286|13.743611| 3.431429| 4.804762|36.324103|
+---+---------+---------+---------+---------+---------+
>
获取统计概括信息 describe函数
describe(mtcars)
mtcars
11 Variables 32 Observations
----------------------------------------------------------------------------
mpg
n missing distinct Info Mean Gmd .05 .10
32 0 25 0.999 20.09 6.796 12.00 14.34
.25 .50 .75 .90 .95
15.43 19.20 22.80 30.09 31.30
lowest : 10.4 13.3 14.3 14.7 15.0, highest: 26.0 27.3 30.4 32.4 33.9
----------------------------------------------------------------------------
cyl
n missing distinct Info Mean Gmd
32 0 3 0.866 6.188 1.948
Value 4 6 8
Frequency 11 7 14
Proportion 0.344 0.219 0.438
----------------------------------------------------------------------------
disp
n missing distinct Info Mean Gmd .05 .10
32 0 27 0.999 230.7 142.5 77.35 80.61
.25 .50 .75 .90 .95
120.83 196.30 326.00 396.00 449.00
lowest : 71.1 75.7 78.7 79.0 95.1, highest: 360.0 400.0 440.0 460.0 472.0
----------------------------------------------------------------------------
hp
n missing distinct Info Mean Gmd .05 .10
32 0 22 0.997 146.7 77.04 63.65 66.00
.25 .50 .75 .90 .95
96.50 123.00 180.00 243.50 253.55
lowest : 52 62 65 66 91, highest: 215 230 245 264 335
----------------------------------------------------------------------------
drat
n missing distinct Info Mean Gmd .05 .10
32 0 22 0.997 3.597 0.6099 2.853 3.007
.25 .50 .75 .90 .95
3.080 3.695 3.920 4.209 4.314
lowest : 2.76 2.93 3.00 3.07 3.08, highest: 4.08 4.11 4.22 4.43 4.93
----------------------------------------------------------------------------
wt
n missing distinct Info Mean Gmd .05 .10
32 0 29 0.999 3.217 1.089 1.736 1.956
.25 .50 .75 .90 .95
2.581 3.325 3.610 4.048 5.293
lowest : 1.513 1.615 1.835 1.935 2.140, highest: 3.845 4.070 5.250 5.345 5.424
----------------------------------------------------------------------------
qsec
n missing distinct Info Mean Gmd .05 .10
32 0 30 1 17.85 2.009 15.05 15.53
.25 .50 .75 .90 .95
16.89 17.71 18.90 19.99 20.10
lowest : 14.50 14.60 15.41 15.50 15.84, highest: 19.90 20.00 20.01 20.22 22.90
----------------------------------------------------------------------------
vs
n missing distinct Info Sum Mean Gmd
32 0 2 0.739 14 0.4375 0.5081
----------------------------------------------------------------------------
am
n missing distinct Info Sum Mean Gmd
32 0 2 0.724 13 0.4062 0.498
----------------------------------------------------------------------------
gear
n missing distinct Info Mean Gmd
32 0 3 0.841 3.688 0.7863
Value 3 4 5
Frequency 15 12 5
Proportion 0.469 0.375 0.156
----------------------------------------------------------------------------
carb
n missing distinct Info Mean Gmd
32 0 6 0.929 2.812 1.718
Value 1 2 3 4 6 8
Frequency 7 10 3 10 1 1
Proportion 0.219 0.312 0.094 0.312 0.031 0.031
-
- 数据可视化 caret ::featurePlot() 无缺失值数据
-
library(caret) str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... > featurePlot(iris[,1:4],iris[,5],'ellipse')
plot(iris)
plot(iris$Sepal.Length)
plot(iris$Species)
with(iris,{plot(Sepal.Length,Sepal.Width,pch=as.numeric(Species))
legend('topright',legend=levels(iris$Species),pch=1:3,ncol=3,cex=0.8)})
因子类型数据可用函数:mosaicplot()进行绘制
table(iris$Species)
setosa versicolor virginica
50 50 50
>
推荐阅读