introduction to data science w4

程序员文章站 2024-01-04 20:44:34

...

numpy提供方法来模拟运行binomial distribution：

np.random.binomial（n,p）//n代表模拟的次数，p代表成功率
np.random.binomial(n,p,size)
//例如，np.random.binomial(20,0.5,10000)表示进行10000次抛20次硬币的模拟，输出结果为一个数组，每个数是进行试验得到的结果的加和

x = np.random.binomial(20, .5, 10000)

print((x>=15).mean())

显示结果

Q：求两天连续有龙卷风的概率

chance_of_tornado = 0.01
tornado_events = np.random.binomial(1, chance_of_tornado, 1000000)
two_days_in_a_row = 0
for j in range(1,len(tornado_events)-1):
    if tornado_events[j]==1 and tornado_events[j-1]==1:
        two_days_in_a_row+=1
print('{} tornadoes back to back in {} years'.format(two_days_in_a_row, 1000000/365))

np.std(distribution)

stats.skew(distribution)给出一个分布的skew值

chi_squared_df5 = np.random.chisquare(5, size=10000)

stats.skew(chi_squared_df5)

推荐书：think stats，o'reilly系列，pdf版本在greenteapress.com/thinkstats2/index.html

hypothesis test: a statement you can test

alternative hypothesis: there is a difference between groups

null hypothesis: there is no difference between A and B

critical value: a threshold as to how much chance you are willing to accept the alternative

要比较两个distribution有没有区别，用 T test，scipy有提供

from scipy import stats

stats.ttest_ind?

stats.ttest_ind(early['assignment1_grade'], late['assignment1_grade'])//把两个distribution传入就可以了。

如果t test结果中p value比a大，那么无法拒绝null hypothesis。

introduction to data science w4

introduction to data science w4

Coursera Introduction to Data Science in Python Assignment2

Introduction to Service Data Objects

data.table(一)|Introduction to data.table

Data Science | Numpy基础(二)

Intro to Python for Data Science Learning 6 - NumPy

Mastering Spark for Data Science：数据集成

Python for Data Science

Python Data Science, NumPy 1

Data Science完整学习路径Python版