《概率论与数理统计》作业一,python画频率分布表
《概率论与数理统计》作业一,python画频率分布表
5.1
2:
总体:全体成年男子的抽烟情况
样本:50个同学调查到的全部5000名男子
总体分布:Bernoulli分布
5:
总体:某场生产的所有电容器
样本:抽出的n件产品
样本分布:
假设每个样本的分布iid,且都服从指数分布
P
(
X
1
=
x
1
,
X
2
=
x
2
,
.
.
.
,
X
n
=
x
n
)
=
Π
i
=
1
n
λ
e
−
λ
x
i
P(X_1=x_1,X_2=x_2,...,X_n=x_n)=\Pi_{i=1}^{n} \lambda e^{-\lambda x_i }
P(X1=x1,X2=x2,...,Xn=xn)=Πi=1nλe−λxi
6:
我认为这个结论是不合理的,因为总体是所有毕业生,但是样本是返校毕业生,工资低混的不好的毕业生不太愿意返校,抽样不随机。毕业生平均工资低于5万美金。
平均工资,平均年龄等样本数据一般有偏,样本均值不适合代表平均水平。
5.2
2:
3+4+8+3+2=20
分布函数要求右连续
F
20
(
x
)
=
{
0
x
<
38
3
20
38
≤
x
<
48
7
20
48
≤
x
<
58
3
4
58
≤
x
<
68
9
10
68
≤
x
<
78
1
x
≥
78
F_{20}(x)=\left\{ \begin{aligned} &0 \qquad & x< 38 \\ &\frac{3}{20} & 38\leq x< 48 \\ &\frac{7}{20} & 48\leq x< 58\\ &\frac{3}{4} &58\leq x<68\\ &\frac{9}{10}&68\leq x< 78\\ &1& x\geq 78 \end{aligned} \right.
F20(x)=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧0203207431091x<3838≤x<4848≤x<5858≤x<6868≤x<78x≥78
3:
#顺序排列
import numpy as np
import pandas as pd
t2=[909,1086,1120,999,1320,1091,1071,1081,
1130,1336,967,1572,825,914,992,1232,950,
775,1203,1025,1096,808,1224,1044,871,1164,971,950,866,738]
t2=np.sort(t2)#排序
print(t2.shape,t2,(np.max(t2)-np.min(t2))/6)
(30,) [ 738 775 808 825 866 871 909 914 950 950 967 971 992 999
1025 1044 1071 1081 1086 1091 1096 1120 1130 1164 1203 1224 1232 1320
1336 1572] 139.0
#频率分布表
#取间隔为140
t22=pd.cut(t2,6, labels=[u"(737,877]",u"(877,1017]",u"(1017,1157]",u"(1157,1297]",u"(1297,1437]",u"(1437,1577]"])
t22=t22.value_counts()
t22=pd.DataFrame(t22)
t22['分组区间'] = t22.index
t22.columns = ['频数','分组区间']
t22.reset_index(drop=True, inplace=True)
t22['组中值'] =[807,947,1087,1227,1367,1507]
t22['频率']=t22['频数']/30
##计算累计频率
ljpl=[0]
for i in t22['频率']:
ljpl.append(i+ljpl[-1])
t22['累计频率']=ljpl[1:]
t22=t22[['分组区间','组中值','频数','频率','累计频率']]
t22
分组区间 | 组中值 | 频数 | 频率 | 累计频率 | |
---|---|---|---|---|---|
0 | (737,877] | 807 | 6 | 0.200000 | 0.200000 |
1 | (877,1017] | 947 | 8 | 0.266667 | 0.466667 |
2 | (1017,1157] | 1087 | 9 | 0.300000 | 0.766667 |
3 | (1157,1297] | 1227 | 4 | 0.133333 | 0.900000 |
4 | (1297,1437] | 1367 | 2 | 0.066667 | 0.966667 |
5 | (1437,1577] | 1507 | 1 | 0.033333 | 1.000000 |
#画直方图
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
plt.hist(t2, bins=6)
plt.title('第三题直方图')
Text(0.5, 1.0, '第三题直方图')
5:
t5=[5954,5022,14667,6582,6870,1840,2662,4508,
1208,3852,618,3008,1268,1978,7963,2048,
3077,993,353,14263,1714,11127,6926,2047,
714,5923,6006,14267,1697,13867,4001,2280,
1223,12579,13588,7315,4538,13304,1615,8612]
t5=np.sort(t5)
print(t5.shape,t5)
(40,) [ 353 618 714 993 1208 1223 1268 1615 1697 1714 1840 1978
2047 2048 2280 2662 3008 3077 3852 4001 4508 4538 5022 5923
5954 6006 6582 6870 6926 7315 7963 8612 11127 12579 13304 13588
13867 14263 14267 14667]
(14667-353)/1700
8.42
ran=[]
for i in range(10):ran.append(352+i*1700)
lable=[]
for i in range(9):
lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
lable
['(352,2052]',
'(2052,3752]',
'(3752,5452]',
'(5452,7152]',
'(7152,8852]',
'(8852,10552]',
'(10552,12252]',
'(12252,13952]',
'(13952,15652]']
t55=pd.cut(t5,ran, labels=lable)
t55=t55.value_counts()
t55=pd.DataFrame(t55)
t55['分组区间'] = t55.index
t55.columns = ['频数','分组区间']
t55.reset_index(drop=True, inplace=True)
#组中值
zzz=[]
for i in range(9):
zzz.append(ran[i]+1700/2)
t55['组中值'] =zzz
t55['频率']=t55['频数']/40
##计算累计频率
ljpl=[0]
for i in t55['频率']:
ljpl.append(i+ljpl[-1])
t55['累计频率']=ljpl[1:]
t55=t55[['分组区间','组中值','频数','频率','累计频率']]
t55
分组区间 | 组中值 | 频数 | 频率 | 累计频率 | |
---|---|---|---|---|---|
0 | (352,2052] | 1202.0 | 14 | 0.350 | 0.350 |
1 | (2052,3752] | 2902.0 | 4 | 0.100 | 0.450 |
2 | (3752,5452] | 4602.0 | 5 | 0.125 | 0.575 |
3 | (5452,7152] | 6302.0 | 6 | 0.150 | 0.725 |
4 | (7152,8852] | 8002.0 | 3 | 0.075 | 0.800 |
5 | (8852,10552] | 9702.0 | 0 | 0.000 | 0.800 |
6 | (10552,12252] | 11402.0 | 1 | 0.025 | 0.825 |
7 | (12252,13952] | 13102.0 | 4 | 0.100 | 0.925 |
8 | (13952,15652] | 14802.0 | 3 | 0.075 | 1.000 |
plt.hist(t5, bins=ran)
plt.title('第五题直方图')
Text(0.5, 1.0, '第五题直方图')
5.3
3:
y ˉ = 3 x ˉ − 4 \bar{y}=3\bar{x}-4 yˉ=3xˉ−4
s y 2 = 1 n − 1 ∑ i ( y i − y ˉ ) 2 = 1 n − 1 ∑ i ( 3 x i − 4 − ( 3 x ˉ − 4 ) ) 2 = 1 n − 1 ∑ i 9 ( x i − x ˉ ) 2 = 9 s x 2 s_y^2=\frac{1}{n-1}\sum_{i}(y_i-\bar{y})^2=\frac{1}{n-1}\sum_{i}(3x_i-4-(3\bar{x}-4))^2=\frac{1}{n-1}\sum_{i}9(x_i-\bar{x})^2=9s_x^2 sy2=n−11∑i(yi−yˉ)2=n−11∑i(3xi−4−(3xˉ−4))2=n−11∑i9(xi−xˉ)2=9sx2
4:
pf:
(
n
+
1
)
x
n
+
1
ˉ
−
(
n
+
1
)
x
n
ˉ
=
x
n
+
1
−
x
n
ˉ
(n+1)\bar{x_{n+1}}-(n+1)\bar{x_n}=x_{n+1}-\bar{x_n}
(n+1)xn+1ˉ−(n+1)xnˉ=xn+1−xnˉ
左右同时除以n+1即得所证
pf:
$ns_{n+1}2-(n-1)s_{n}2=\sum_{i=1}{n+1}(x_i-\bar{x}_{n+1})2-\sum_{i=1}{n}(x_i-\bar{x}_n)2
=x_{n+1}2-2(\sum_{i=1}{n+1}x_i \bar{x}{n+1}-\sum{i=1}^{n}x_i \bar{x}{n})+((n+1)\bar{x}{n+1}2-n\bar{x}_n2)=x_{n+1}2-2[x_{n+1}\bar{x}_{n+1}-\sum_{i=1}{n}x_i(\bar{x}{n+1}-\bar{x}{n})]+((n+1)\bar{x}{n+1}2-n\bar{x}_n2)=x{n+1}2-2[x_{n+1}\bar{x}_{n+1}-\frac{n}{n+1}(x_{n+1}-\bar{x}_n)\bar{x}_n]+((n+1)\bar{x}_{n+1}2-n\bar{x}_n^2)
$
把
x
ˉ
n
+
1
\bar{x}_{n+1}
xˉn+1带入上一条证明中的
x
ˉ
n
+
1
n
+
1
(
x
n
+
1
−
x
ˉ
n
)
\bar{x}_n+\frac{1}{n+1}(x_{n+1}-\bar{x}_n)
xˉn+n+11(xn+1−xˉn)
可得
n
s
n
+
1
2
−
(
n
−
1
)
s
n
2
=
n
n
+
1
(
x
n
+
1
−
x
ˉ
n
)
2
n s_{n+1}^2-(n-1)s_{n}^2=\frac{n}{n+1}(x_{n+1}-\bar{x}_{n})^2
nsn+12−(n−1)sn2=n+1n(xn+1−xˉn)2
两边同时除以n即为所求
remark:这道题说明随着抽样样本的增加可逐次计算样本 均值与方差
5:
pf:
x
ˉ
=
1
m
+
n
∑
i
m
+
n
x
i
=
∑
j
=
1
m
x
j
2
+
∑
i
=
1
n
x
i
1
m
+
n
=
n
x
ˉ
1
+
m
x
ˉ
2
m
+
n
\bar{x}=\frac{1}{m+n}\sum_{i}^{m+n}x_{i}=\frac{\sum_{j=1}^{m}x_{j}^{2}+\sum_{i=1}^{n}x_{i}^{1}}{m+n}=\frac{n\bar{x}_1+m\bar{x}_2}{m+n}
xˉ=m+n1∑im+nxi=m+n∑j=1mxj2+∑i=1nxi1=m+nnxˉ1+mxˉ2
其中
x
j
1
x_{j}^1
xj1表示容量为n的样本中的样本的取值
x
i
2
x_{i}^2
xi2表示容量为m的样本中的样本的取值
pf:
s 2 = ∑ i = 1 n ( x i 1 − x ˉ ) 2 + ∑ i = 1 m ( x i 2 − x ˉ ) 2 m + n − 1 s^2=\frac{\sum_{i=1}^{n}(x_{i}^1-\bar{x} )^2+\sum_{i=1}^{m}(x_i^2-\bar{x})^2}{m+n-1} s2=m+n−1∑i=1n(xi1−xˉ)2+∑i=1m(xi2−xˉ)2
= ∑ i = 1 n ( x i 1 − n x ˉ 1 + m x ˉ 2 m + n ) 2 + ∑ i = 1 m ( x i 2 − n x ˉ 1 + m x ˉ 2 m + n m + n − 1 =\frac{\sum_{i=1}^{n}(x_{i}^1-\frac{n\bar{x}_1+m\bar{x}_2}{m+n} )^2+\sum_{i=1}^{m}(x_i^2-\frac{n\bar{x}_1+m\bar{x}_2}{m+n}}{m+n-1} =m+n−1∑i=1n(xi1−m+nnxˉ1+mxˉ2)2+∑i=1m(xi2−m+nnxˉ1+mxˉ2
= ∑ i = 1 n ( x i 1 − x ˉ 1 + m ( x ˉ 1 − x ˉ 2 ) 2 m + n ) 2 m + n − 1 + ∑ i = 1 m ( x i 2 − x ˉ 2 + n ( x ˉ 1 − x ˉ 2 ) 2 m + n ) 2 m + n − 1 =\frac{\sum_{i=1}^n(x_i^1-\bar{x}_1+\frac{m(\bar{x}_1-\bar{x}_2)^2}{m+n})^2}{m+n-1}+\frac{\sum_{i=1}^m(x_i^2-\bar{x}_2+\frac{n(\bar{x}_1-\bar{x}_2)^2}{m+n})^2}{m+n-1} =m+n−1∑i=1n(xi1−xˉ1+m+nm(xˉ1−xˉ2)2)2+m+n−1∑i=1m(xi2−xˉ2+m+nn(xˉ1−xˉ2)2)2
= ( n − 1 ) s 1 2 + ( m − 1 ) s 2 2 + m n ( x ˉ 1 − x ˉ 2 ) 2 m + n m + n − 1 =\frac{(n-1)s_1^2+(m-1)s_2^2+\frac{mn(\bar{x}_1-\bar{x}_2)^2}{m+n}}{m+n-1} =m+n−1(n−1)s12+(m−1)s22+m+nmn(xˉ1−xˉ2)2
由上式记得所求。
8:
E ( x ˉ ) = E ( ∑ i = 1 n x n n ) = 0 E(\bar{x})=E(\frac{\sum_{i=1}^n x_n}{n})=0 E(xˉ)=E(n∑i=1nxn)=0
V a r ( x ˉ ) = 1 n 2 ∑ i = 1 n V a r ( x i ) = 1 n V a r ( x i ) Var(\bar{x})=\frac{1}{n^2}\sum_{i=1}^{n} Var(x_i)=\frac{1}{n}Var(x_i) Var(xˉ)=n21∑i=1nVar(xi)=n1Var(xi)
V a r ( x i ) = E ( x i 2 ) = 1 2 ∫ − 1 1 x 2 d x = 1 3 Var(x_i)=E(x_i^2)=\frac{1}{2}\int_{-1}{1}x^2 dx=\frac{1}{3} Var(xi)=E(xi2)=21∫−11x2dx=31
V a r ( x ˉ ) = 1 3 n Var(\bar{x})=\frac{1}{3n} Var(xˉ)=3n1
10:
∑ i < j ( x i − x j ) 2 = 1 2 ∑ i = 1 n ∑ j = 1 n ( ( x i − x ˉ ) + ( x ˉ − x j ) ) 2 = 1 2 ∑ i = 1 n ∑ j = 1 n ( x i − x ˉ ) 2 + ( x j − x ˉ ) 2 − 2 ( x i x j + x ˉ 2 ) = 1 2 ∑ i = 1 n ∑ j = 1 n [ ( x i − x ˉ ) 2 + ( x j − x ˉ ) 2 ] = n ( n − 1 ) s 2 \sum_{i<j}(x_i-x_j)^2=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}((x_i-\bar{x})+(\bar{x}-x_j))^2=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}(x_i-\bar{x})^2+(x_j-\bar{x})^2-2(x_ix_j+\bar{x}^2)=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}[(x_i-\bar{x})^2+(x_j-\bar{x})^2]=n(n-1)s^2 ∑i<j(xi−xj)2=21∑i=1n∑j=1n((xi−xˉ)+(xˉ−xj))2=21∑i=1n∑j=1n(xi−xˉ)2+(xj−xˉ)2−2(xixj+xˉ2)=21∑i=1n∑j=1n[(xi−xˉ)2+(xj−xˉ)2]=n(n−1)s2
13:
由正态分布的再生性
x
ˉ
1
∼
N
(
μ
,
σ
2
n
)
,
x
ˉ
2
∼
N
(
μ
,
σ
2
n
)
\bar{x}_1\sim N(\mu,\frac{\sigma^2}{n}),\bar{x}_2\sim N(\mu,\frac{\sigma^2}{n})
xˉ1∼N(μ,nσ2),xˉ2∼N(μ,nσ2)
μ
ˉ
=
x
ˉ
1
−
x
ˉ
2
,
μ
ˉ
∼
N
(
0
,
2
σ
2
n
)
\bar{\mu}=\bar{x}_1-\bar{x}_2,\quad \bar{\mu}\sim N(0,\frac{2\sigma^2}{n})
μˉ=xˉ1−xˉ2,μˉ∼N(0,n2σ2)
记
ϕ
\phi
ϕ为标准正态分布的分布函数
解
P
(
∣
μ
ˉ
>
σ
∣
)
≤
0.01
→
2
ϕ
(
σ
σ
2
n
)
−
1
P(|\bar{\mu}>\sigma|)\leq 0.01\rightarrow 2\phi(\frac{\sigma}{\sigma \sqrt{\frac{2}{n}}})-1
P(∣μˉ>σ∣)≤0.01→2ϕ(σn2
σ)−1得
n
≥
14
n\geq 14
n≥14
24:
P
(
x
(
16
)
>
10
)
=
1
−
P
(
x
(
16
)
≤
10
)
=
1
−
P
(
x
≤
10
)
1
6
=
0.937
P(x_{(16)}>10)=1-P(x_{(16)}\leq 10)=1-P(x\leq 10)^16=0.937
P(x(16)>10)=1−P(x(16)≤10)=1−P(x≤10)16=0.937
P
(
x
(
1
)
>
5
)
=
[
1
−
P
(
x
≤
5
)
]
16
=
0.331
P(x_{(1)>5})=[1-P(x\leq 5)]^{16}=0.331
P(x(1)>5)=[1−P(x≤5)]16=0.331
28:
(1)
pf:
η
∈
[
0
,
1
]
\eta\in [0,1]
η∈[0,1]
P
(
η
i
=
t
)
=
i
(
n
i
)
P
(
η
=
t
)
P
(
η
<
t
)
i
−
1
(
1
−
P
(
η
<
t
)
)
n
−
i
P(\eta_{i}=t)=i\binom{n}{i}P(\eta=t)P(\eta<t)^{i-1}(1-P(\eta<t))^{n-i}
P(ηi=t)=i(in)P(η=t)P(η<t)i−1(1−P(η<t))n−i
由
P
(
η
<
t
)
=
P
(
F
(
x
)
<
t
)
=
P
(
x
<
F
−
1
(
t
)
)
=
F
⋅
F
−
1
(
t
)
=
t
→
F(x)连续,对t求导
P
(
η
=
t
)
=
1
P(\eta<t)=P(F(x)<t)=P(x<F^{-1}(t))=F\cdot F^{-1}(t)=t \overset{\text{F(x)连续,对t求导}}\rightarrow P(\eta=t)=1
P(η<t)=P(F(x)<t)=P(x<F−1(t))=F⋅F−1(t)=t→F(x)连续,对t求导P(η=t)=1
从而
P
(
η
i
=
t
)
=
i
(
n
i
)
t
i
−
1
(
1
−
t
)
n
−
i
P(\eta_{i}=t)=i\binom{n}{i} t^{i-1}(1-t)^{n-i}
P(ηi=t)=i(in)ti−1(1−t)n−i
上述概率密度函数也是n个i.i.d.且服从
U
[
0
,
1
]
U[0,1]
U[0,1]的随机变量的次序统计量的概率密度函数。
(2)
B ( m , n ) = ∫ 0 1 x m − 1 ( 1 − x ) n − 1 d x = Γ ( m ) Γ ( n ) Γ ( m + n ) B(m,n)=\int_{0}^1 x^{m-1}(1-x)^{n-1}dx=\frac{\Gamma (m)\Gamma (n)}{\Gamma (m+n)} B(m,n)=∫01xm−1(1−x)n−1dx=Γ(m+n)Γ(m)Γ(n)
E
(
η
i
)
=
n
(
n
−
1
i
−
1
)
∫
0
1
t
i
(
1
−
t
)
n
−
i
=
i
n
!
i
!
(
n
−
i
)
!
(
i
)
!
(
n
−
i
)
!
(
n
+
1
)
!
=
i
n
+
1
E(\eta_i)=n\binom{n-1}{i-1} \int_0^1 t^i(1-t)^{n-i}=i \frac{n!}{i!(n-i)!}\frac{(i)!(n-i)!}{(n+1)!}=\frac{i}{n+1}
E(ηi)=n(i−1n−1)∫01ti(1−t)n−i=ii!(n−i)!n!(n+1)!(i)!(n−i)!=n+1i
V
a
r
(
η
i
)
=
i
n
!
i
!
(
n
−
i
)
!
i
n
t
0
1
t
i
−
1
(
1
−
t
)
n
−
i
(
t
−
i
n
+
1
)
2
d
t
=
i
n
!
i
!
(
n
−
i
)
!
[
(
i
+
1
)
!
(
n
−
i
)
!
(
n
+
2
)
!
−
2
i
n
+
1
i
!
(
n
−
1
)
!
(
n
+
1
)
!
+
i
2
(
n
+
1
)
2
(
i
−
1
)
!
(
n
−
i
)
!
n
!
]
=
i
(
n
−
i
+
1
)
(
n
+
1
)
2
(
n
+
2
)
Var(\eta_i)=i\frac{n!}{i!(n-i)!}int_{0}^1 t^{i-1}(1-t)^{n-i}(t-\frac{i}{n+1})^2 dt=i\frac{n!}{i!(n-i)!}[\frac{(i+1)!(n-i)!}{(n+2)!}-\frac{2i}{n+1}\frac{i!(n-1)!}{(n+1)!}+\frac{i^2}{(n+1)^2}\frac{(i-1)!(n-i)!}{n!}]=\frac{i(n-i+1)}{(n+1)^2(n+2)}
Var(ηi)=ii!(n−i)!n!int01ti−1(1−t)n−i(t−n+1i)2dt=ii!(n−i)!n![(n+2)!(i+1)!(n−i)!−n+12i(n+1)!i!(n−1)!+(n+1)2i2n!(i−1)!(n−i)!]=(n+1)2(n+2)i(n−i+1)
(3)
协方差矩阵A,其中
A
(
1
,
1
)
=
V
a
r
(
η
i
)
,
A
(
2
,
2
)
=
V
a
r
(
η
j
)
A(1,1)=Var(\eta_i),A(2,2)=Var(\eta_j)
A(1,1)=Var(ηi),A(2,2)=Var(ηj),从而只证明
A
(
1
,
2
)
=
A
(
2
,
1
)
=
c
o
v
(
η
1
,
η
2
)
A(1,2)=A(2,1)=cov(\eta_1,\eta_2)
A(1,2)=A(2,1)=cov(η1,η2)
先求
η
1
,
η
2
\eta_1,\eta_2
η1,η2的联合分布密度函数:
不妨设
i
≤
j
i\leq j
i≤j,则
P
(
η
i
=
t
1
,
η
j
=
t
2
)
=
(
n
i
−
1
,
j
−
i
−
1
,
n
−
j
)
t
1
i
−
1
(
t
2
−
t
1
)
j
−
i
−
1
t
2
j
P(\eta_i=t_1,\eta_j=t_2)=\binom{n}{i-1,j-i-1,n-j}t_1^{i-1}(t_2-t_1)^{j-i-1}t_2^{j}
P(ηi=t1,ηj=t2)=(i−1,j−i−1,n−jn)t1i−1(t2−t1)j−i−1t2j
c o v ( η 1 , η 2 ) cov(\eta_1,\eta_2) cov(η1,η2)
= E ( η i η j ) − E ( η i ) E ( η j ) =E(\eta_i\eta_j)-E(\eta_i)E(\eta_j) =E(ηiηj)−E(ηi)E(ηj)
= E ( η i ) − E ( η i ( 1 − η j ) ) − E ( η i ) E ( η j ) =E(\eta_i)-E(\eta_i(1-\eta_j))-E(\eta_i)E(\eta_j) =E(ηi)−E(ηi(1−ηj))−E(ηi)E(ηj)
= i n + 1 − ∫ 0 1 ∫ 0 1 t 1 ( 1 − t 2 ) ⋅ 2 ( n i − 1 , j − i − 1 , n − j ) t 1 i − 1 ( t 2 − t 1 ) j − i − 1 ( 1 − t 2 ) n − j d t 1 d t 2 − i n + 1 j n + 1 =\frac{i}{n+1}-\int_{0}^{1}\int_0^1 t_1(1-t_2) \cdot 2\binom{n}{i-1,j-i-1,n-j} t_1^{i-1}(t_2-t_1)^{j-i-1}(1-t_2)^{n-j} dt_1 dt_2-\frac{i}{n+1}\frac{j}{n+1} =n+1i−∫01∫01t1(1−t2)⋅2(i−1,j−i−1,n−jn)t1i−1(t2−t1)j−i−1(1−t2)n−jdt1dt2−n+1in+1j
= i ( n + 1 − j ) ( n + 2 ) ( n + 1 ) 2 =\frac{i(n+1-j)}{(n+2)(n+1)^2} =(n+2)(n+1)2i(n+1−j)
= a 1 ( 1 − a 2 ) n + 2 =\frac{a_1(1-a_2)}{n+2} =n+2a1(1−a2)
对于上述积分:
I
=
∫
0
1
∫
0
1
(
n
+
2
i
,
j
−
i
−
1
,
n
−
j
+
1
)
2
t
1
i
(
t
2
−
t
1
)
j
−
i
−
1
(
1
−
t
2
)
n
−
j
+
1
=
1
E
(
η
1
η
2
)
=
i
(
n
−
j
+
1
)
(
n
+
2
)
(
n
+
1
)
I
I=\int_{0}^1\int_0^1 \binom{n+2}{i,j-i-1,n-j+1}2 t_1^{i}(t_2-t_1)^{j-i-1}(1-t_{2})^{n-j+1}=1\\ E(\eta_1\eta_2)=\frac{i(n-j+1)}{(n+2)(n+1)}I
I=∫01∫01(i,j−i−1,n−j+1n+2)2t1i(t2−t1)j−i−1(1−t2)n−j+1=1E(η1η2)=(n+2)(n+1)i(n−j+1)I
关于
I
I
I的积分:把积分对应到某种概率分布,利用概率密度函数的正则性计算积分。
频率分布表画图函数(按照分割区间大小/按照分组
(1)按照分组数
import numpy as np
import pandas as pd
def fredistable_zushu(t,n):#t是数组,n是组数
t=np.sort(t)
mi=np.min(t)
ma=np.max(t)
ran=[]
#不需要分割区间为整数时:cut=(ma-mi)/n
cut=int((ma-mi)/n)+1
for i in range(n+1):
ran.append(mi-1+i*cut)#ran.append(mi+i*cut)直接从最小值开始
lable=[]
for i in range(n):
lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
t1=pd.cut(t,ran, labels=lable)
t1=t1.value_counts()
t1=pd.DataFrame(t1)
t1['分组区间'] = t1.index
t1.columns = ['频数','分组区间']
t1.reset_index(drop=True, inplace=True)
#组中值
zzz=[]
for i in range(n):
zzz.append(ran[i]+float(cut)/2)
t1['组中值'] =zzz
t1['频率']=t1['频数']/np.shape(t)[0]
##计算累计频率
ljpl=[0]
for i in t1['频率']:
ljpl.append(i+ljpl[-1])
t1['累计频率']=ljpl[1:]
t1=t1[['分组区间','组中值','频数','频率','累计频率']]
return(t1)
t5=[5954,5022,14667,6582,6870,1840,2662,4508,
1208,3852,618,3008,1268,1978,7963,2048,
3077,993,353,14263,1714,11127,6926,2047,
714,5923,6006,14267,1697,13867,4001,2280,
1223,12579,13588,7315,4538,13304,1615,8612];
fredistable_zushu(t5,9)
分组区间 | 组中值 | 频数 | 频率 | 累计频率 | |
---|---|---|---|---|---|
0 | (352,1943] | 1147.5 | 11 | 0.275 | 0.275 |
1 | (1943,3534] | 2738.5 | 7 | 0.175 | 0.450 |
2 | (3534,5125] | 4329.5 | 5 | 0.125 | 0.575 |
3 | (5125,6716] | 5920.5 | 4 | 0.100 | 0.675 |
4 | (6716,8307] | 7511.5 | 4 | 0.100 | 0.775 |
5 | (8307,9898] | 9102.5 | 1 | 0.025 | 0.800 |
6 | (9898,11489] | 10693.5 | 1 | 0.025 | 0.825 |
7 | (11489,13080] | 12284.5 | 1 | 0.025 | 0.850 |
8 | (13080,14671] | 13875.5 | 6 | 0.150 | 1.000 |
(2)按照分割区间大小
def fredistable_fenge(t,cut):#t是数组,cut是分割间隔
t=np.sort(t)
mi=np.min(t)
ma=np.max(t)
ran=[]
n=int((ma-mi)/cut)+1
for i in range(n+1):
ran.append(mi-1+i*cut)#ran.append(mi+i*cut)直接从最小值开始
lable=[]
for i in range(n):
lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
t1=pd.cut(t,ran, labels=lable)
t1=t1.value_counts()
t1=pd.DataFrame(t1)
t1['分组区间'] = t1.index
t1.columns = ['频数','分组区间']
t1.reset_index(drop=True, inplace=True)
#组中值
zzz=[]
for i in range(n):
zzz.append(ran[i]+float(cut)/2)
t1['组中值'] =zzz
t1['频率']=t1['频数']/np.shape(t)[0]
##计算累计频率
ljpl=[0]
for i in t1['频率']:
ljpl.append(i+ljpl[-1])
t1['累计频率']=ljpl[1:]
t1=t1[['分组区间','组中值','频数','频率','累计频率']]
return(t1)
fredistable_fenge(t5,1700)
分组区间 | 组中值 | 频数 | 频率 | 累计频率 | |
---|---|---|---|---|---|
0 | (352,2052] | 1202.0 | 14 | 0.350 | 0.350 |
1 | (2052,3752] | 2902.0 | 4 | 0.100 | 0.450 |
2 | (3752,5452] | 4602.0 | 5 | 0.125 | 0.575 |
3 | (5452,7152] | 6302.0 | 6 | 0.150 | 0.725 |
4 | (7152,8852] | 8002.0 | 3 | 0.075 | 0.800 |
5 | (8852,10552] | 9702.0 | 0 | 0.000 | 0.800 |
6 | (10552,12252] | 11402.0 | 1 | 0.025 | 0.825 |
7 | (12252,13952] | 13102.0 | 4 | 0.100 | 0.925 |
8 | (13952,15652] | 14802.0 | 3 | 0.075 | 1.000 |
上一篇: 【电商项目】---购物车