【Jupyter】练习题
程序员文章站
2024-03-05 14:52:26
...
Part 1
For each of the four datasets...
- Compute the mean and variance of both x and y
- Compute the correlation coefficient between x and y
- Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)
OUTPUT
Code
print( 'The mean of x is : ', end="")
print(anascombe['x'].mean())
print( 'The mean of y is : ', end="")
print(anascombe['y'].mean())
print( 'The variance of x is : ', end="")
print(anascombe['x'].var())
print( 'The variance of x is : ', end="")
print(anascombe['y'].var())
print("The correlation coefficient between x and y: ", end="")
print((np.corrcoef(np.array([anascombe['x'], anascombe['y']])))[0][1])
n = len(anascombe)
is_train = np.random.rand(n) < 0.7
train = anascombe[is_train].reset_index(drop=True)
test = anascombe[~is_train].reset_index(drop=True)
lin_model = smf.ols('y ~ x', train).fit()
lin_model.summary()
Part 2
Using Seaborn, visualize all four datasets.
OUTPUT
Code
# your code here
m = sns.FacetGrid(anascombe, col="dataset")
m.map(plt.scatter, "x","y")