15-R-DataCamp-Introduction-to-Data-in-R
程序员文章站
2024-01-30 23:31:04
...
15-R-DataCamp-Introduction-to-Data-in-R
1. Languages of Data
1.1 Welcome to the course (video)
1.2 Loading data into R
Instruction 1:
# Load data
data(email50)
# View the structure of the data
str(email50)
1.3 Types of variables (video)
1.4 Identify variable types
Instruction:
# Glimpse email50
glimpse(email50)
1.5 Categorical data in R: factors (video)
1.6 Filtering based on a factor
Instruction:
# Subset of emails with big numbers: email50_big
email50_big <- email50 %>%
filter(number == "big")
# Glimpse the subset
glimpse(email50_big)
1.7 Complete filtering based on a factor
Instruction:
# Subset of emails with big numbers: email50_big
email50_big <- email50 %>%
filter(number == "big")
# Table of the number variable
table(email50_big$number)
# Drop levels
email50_big$number_dropped <- droplevels(email50_big$number)
# Table of the number_dropped variable
table(email50_big$number_dropped)
1.8 Discretize a variable (video)
1.9 Discretize a different variable
Instruction:
# Calculate median number of characters: med_num_char
med_num_char <- median(email50$num_char)
# Create num_char_cat variable in email50
email50_fortified <- email50 %>%
mutate(num_char_cat = ifelse(num_char < med_num_char, "below median", "at or above median"))
# Count emails in each category
email50_fortified %>%
count(num_char_cat)
1.10 Combining levels of a different factor
Instruction:
# Create number_yn column in email50
email50_fortified <- email50 %>%
mutate(
number_yn = case_when(
# if number is "none", make number_yn "no"
number == "none" ~ "no",
# if number is not "none", make number_yn "yes"
number != "none" ~ "yes"
)
)
# Visualize the distribution of number_yn
ggplot(email50_fortified, aes(x = number_yn)) +
geom_bar()
1.11 Visualizing numerical data (video)
1.12 Visualizing numerical and categorical data
Instruction:
# Load ggplot2
library(ggplot2)
# Scatterplot of exclaim_mess vs. num_char
ggplot(email50, aes(x = num_char, y = exclaim_mess, color = factor(spam))) +
geom_point()
2. Study Types and Cautionary Tales
2.1 Observational studies and experiments (video)
2.2 Identify type of study: Reading speed and font
2.3 Identify type of study: Countries
Instruction:
# Load data
data(gapminder)
# Glimpse data
glimpse(gapminder)
# Identify type of study: observational or experimental
type_of_study <- "observational"
2.4 Random sampling and random assignment (video)
2.5 Random sampling or random assignment?
2.6 Identify the scope of inference of study
2.7 Simpsons paradox (video)
2.8 Number of males and females admitted
Instruction:
# Load packages
library(dplyr)
# Count number of male and female applicants admitted
ucb_admit %>%
count(Gender, Admit)
2.9 Proportion of males admitted overall
Instruction:
ucb_admission_counts %>%
# Group by gender
group_by(Gender) %>%
# Create new variable
mutate(prop = n / sum(n)) %>%
# Filter for admitted
filter(Admit == "Admitted")
2.10 Proportion of males admitted for each department
Instruction 1:
ucb_admission_counts <- ucb_admit %>%
# Counts by department, then gender, then admission status
count(Dept, Gender, Admit)
# See the result
ucb_admission_counts
Instruction 2:
ucb_admission_counts %>%
# Group by department, then gender
group_by(Dept, Gender) %>%
# Create new variable
mutate(prop = n / sum(n)) %>%
# Filter for male and admitted
filter(Gender == "Male", Admit == "Admitted")
2.11 Admission rates for males across departments
2.12 Recap: Simpson’s paradox (video)
Identify type of study: Countries [new]
3. Sampling Strategies and Experimental Design
3.1 Sampling strategies (video)
3.2 Sampling strategies, determine which
3.3 Sampling strategies, choose worst
3.4 Sampling in R (video)
3.5 Simple random sample in R
Instruction:
# Simple random sample: states_srs
states_srs <- us_regions %>%
sample_n(size = 8)
# Count states by region
states_srs %>%
count(region)
3.6 Stratified sample in R
Instruction:
# Stratified sample
states_str <- us_regions %>%
group_by(region) %>%
sample_n(2)
# Count states by region
states_str %>%
count(region)
3.7 Compare SRS vs stratified sample
3.8 Principles of experimental design (video)
3.9 Identifying components of a study
3.10 Experimental design terminology
3.11 Connect blocking and stratifying
4. Case study
4.1 Beauty in the classroom (video)
4.2 Inspect the data
Instruction:
# Inspect evals
glimpse(evals)
4.3 Identify type of study
4.4 Sampling/ experimental attributes
4.5 Variables in the data (video)
4.6 Identify variable types
Instruction:
# Inspect variable types
glimpse(evals)
# Remove non-factor variables from the vector below
cat_vars <- c("rank", "ethnicity", "gender", "language", "cls_level", "cls_profs", "cls_credits","pic_outfit", "pic_color")
4.7 Recode a variable
Instruction:
# Recode cls_students as cls_type
evals_fortified <- evals %>%
mutate(
cls_type = case_when(
cls_students <=18 ~ "small",
cls_students <=59 ~ "midsize",
cls_students >=60 ~ "large"
)
)
4.8 Create a scatterplot
Instruction:
# Scatterplot of score vs. bty_avg
ggplot(evals, aes(x = bty_avg, y = score)) +
geom_point()
4.9 Create a scatterplot, with an added layer
Instruction:
# Scatterplot of score vs. bty_avg colored by cls_type
ggplot(evals, aes(x = bty_avg, y = score, color = cls_type)) +
geom_point()