introduction to data science w4
numpy提供方法来模拟运行binomial distribution:
np.random.binomial(n,p)//n代表模拟的次数,p代表成功率
np.random.binomial(n,p,size)
//例如,np.random.binomial(20,0.5,10000)表示进行10000次抛20次硬币的模拟,输出结果为一个数组,每个数是进行试验得到的结果的加和
print((x>=15).mean())
显示结果
Q:求两天连续有龙卷风的概率
chance_of_tornado = 0.01
tornado_events = np.random.binomial(1, chance_of_tornado, 1000000)
two_days_in_a_row = 0
for j in range(1,len(tornado_events)-1):
if tornado_events[j]==1 and tornado_events[j-1]==1:
two_days_in_a_row+=1
print('{} tornadoes back to back in {} years'.format(two_days_in_a_row, 1000000/365))
np.std(distribution)
stats.skew(distribution)给出一个分布的skew值
chi_squared_df5 = np.random.chisquare(5, size=10000)
stats.skew(chi_squared_df5)
推荐书:think stats,o'reilly系列,pdf版本在greenteapress.com/thinkstats2/index.html
hypothesis test: a statement you can test
alternative hypothesis: there is a difference between groups
null hypothesis: there is no difference between A and B
critical value: a threshold as to how much chance you are willing to accept the alternative
要比较两个distribution有没有区别,用 T test,scipy有提供
from scipy import stats
stats.ttest_ind?
stats.ttest_ind(early['assignment1_grade'], late['assignment1_grade'])//把两个distribution传入就可以了。
如果t test结果中p value比a大,那么无法拒绝null hypothesis。
推荐阅读
-
introduction to data science w4
-
Coursera Introduction to Data Science in Python Assignment2
-
Introduction to Service Data Objects
-
data.table(一)|Introduction to data.table
-
Data Science | Numpy基础(二)
-
Intro to Python for Data Science Learning 6 - NumPy
-
Mastering Spark for Data Science:数据集成
-
Python for Data Science
-
Python Data Science, NumPy 1
-
Data Science完整学习路径Python版