欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【Jupyter】练习题

程序员文章站 2024-03-05 14:52:26
...

Part 1

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

OUTPUT

【Jupyter】练习题

                  【Jupyter】练习题

Code

print( 'The mean of x is : ', end="")
print(anascombe['x'].mean())
print( 'The mean of y is : ', end="")
print(anascombe['y'].mean())
print( 'The variance of x is : ', end="")
print(anascombe['x'].var())
print( 'The variance of x is : ', end="")
print(anascombe['y'].var())

print("The correlation coefficient between x and y: ", end="") 
print((np.corrcoef(np.array([anascombe['x'], anascombe['y']])))[0][1]) 

n = len(anascombe)  
is_train = np.random.rand(n) < 0.7  
train = anascombe[is_train].reset_index(drop=True)  
test = anascombe[~is_train].reset_index(drop=True)  
lin_model = smf.ols('y ~ x', train).fit()  
lin_model.summary()


Part 2

        Using Seaborn, visualize all four datasets.

OUTPUT


【Jupyter】练习题

Code

# your code here
m = sns.FacetGrid(anascombe, col="dataset")  
m.map(plt.scatter, "x","y")