欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

15-R-DataCamp-Introduction-to-Data-in-R

程序员文章站 2024-01-30 23:31:04
...

15-R-DataCamp-Introduction-to-Data-in-R

1. Languages of Data

1.1 Welcome to the course (video)
1.2 Loading data into R

Instruction 1:

# Load data
data(email50)

# View the structure of the data
str(email50)
1.3 Types of variables (video)
1.4 Identify variable types

Instruction:

# Glimpse email50
glimpse(email50)
1.5 Categorical data in R: factors (video)
1.6 Filtering based on a factor

Instruction:

# Subset of emails with big numbers: email50_big
email50_big <- email50 %>%
  filter(number == "big")

# Glimpse the subset
glimpse(email50_big)
1.7 Complete filtering based on a factor

Instruction:

# Subset of emails with big numbers: email50_big
email50_big <- email50 %>%
  filter(number == "big")

# Table of the number variable
table(email50_big$number)

# Drop levels
email50_big$number_dropped <- droplevels(email50_big$number)

# Table of the number_dropped variable
table(email50_big$number_dropped)
1.8 Discretize a variable (video)
1.9 Discretize a different variable

Instruction:

# Calculate median number of characters: med_num_char
med_num_char <- median(email50$num_char)

# Create num_char_cat variable in email50
email50_fortified <- email50 %>%
  mutate(num_char_cat = ifelse(num_char < med_num_char, "below median", "at or above median"))
  
# Count emails in each category
email50_fortified %>%
  count(num_char_cat)
1.10 Combining levels of a different factor

Instruction:

# Create number_yn column in email50
email50_fortified <- email50 %>%
  mutate(
    number_yn = case_when(
      # if number is "none", make number_yn "no"
      number == "none" ~ "no", 
      # if number is not "none", make number_yn "yes"
      number != "none" ~ "yes"  
    )
  )
  
# Visualize the distribution of number_yn
ggplot(email50_fortified, aes(x = number_yn)) +
  geom_bar()
1.11 Visualizing numerical data (video)
1.12 Visualizing numerical and categorical data

Instruction:

# Load ggplot2
library(ggplot2)

# Scatterplot of exclaim_mess vs. num_char
ggplot(email50, aes(x = num_char, y = exclaim_mess, color = factor(spam))) +
  geom_point()

2. Study Types and Cautionary Tales

2.1 Observational studies and experiments (video)
2.2 Identify type of study: Reading speed and font
2.3 Identify type of study: Countries

Instruction:

# Load data
data(gapminder)

# Glimpse data
glimpse(gapminder)

# Identify type of study: observational or experimental
type_of_study <- "observational"
2.4 Random sampling and random assignment (video)
2.5 Random sampling or random assignment?
2.6 Identify the scope of inference of study
2.7 Simpsons paradox (video)
2.8 Number of males and females admitted

Instruction:

# Load packages
library(dplyr)

# Count number of male and female applicants admitted
ucb_admit %>%
  count(Gender, Admit)
2.9 Proportion of males admitted overall

Instruction:

ucb_admission_counts %>%
  # Group by gender
  group_by(Gender) %>%
  # Create new variable
  mutate(prop = n / sum(n)) %>%
  # Filter for admitted
  filter(Admit == "Admitted")
2.10 Proportion of males admitted for each department

Instruction 1:

ucb_admission_counts <- ucb_admit %>%
  # Counts by department, then gender, then admission status
  count(Dept, Gender, Admit)

# See the result
ucb_admission_counts

Instruction 2:

ucb_admission_counts  %>%
  # Group by department, then gender
  group_by(Dept, Gender) %>%
  # Create new variable
  mutate(prop = n / sum(n)) %>%
  # Filter for male and admitted
  filter(Gender == "Male", Admit == "Admitted")
2.11 Admission rates for males across departments
2.12 Recap: Simpson’s paradox (video)
Identify type of study: Countries [new]

3. Sampling Strategies and Experimental Design

3.1 Sampling strategies (video)
3.2 Sampling strategies, determine which
3.3 Sampling strategies, choose worst
3.4 Sampling in R (video)
3.5 Simple random sample in R

Instruction:

# Simple random sample: states_srs
states_srs <- us_regions %>%
  sample_n(size = 8)

# Count states by region
states_srs %>%
  count(region)
3.6 Stratified sample in R

Instruction:

# Stratified sample
states_str <- us_regions %>%
  group_by(region) %>%
  sample_n(2)

# Count states by region
states_str %>%
  count(region)
3.7 Compare SRS vs stratified sample
3.8 Principles of experimental design (video)
3.9 Identifying components of a study
3.10 Experimental design terminology
3.11 Connect blocking and stratifying

4. Case study

4.1 Beauty in the classroom (video)
4.2 Inspect the data

Instruction:

# Inspect evals
glimpse(evals)
4.3 Identify type of study
4.4 Sampling/ experimental attributes
4.5 Variables in the data (video)
4.6 Identify variable types

Instruction:

# Inspect variable types
glimpse(evals)

# Remove non-factor variables from the vector below
cat_vars <- c("rank", "ethnicity", "gender", "language", "cls_level", "cls_profs", "cls_credits","pic_outfit", "pic_color")
4.7 Recode a variable

Instruction:

# Recode cls_students as cls_type
evals_fortified <- evals %>%
  mutate(
    cls_type = case_when(
      cls_students <=18  ~ "small",
      cls_students <=59 ~ "midsize",
      cls_students >=60 ~ "large"
    )
  )
4.8 Create a scatterplot

Instruction:

# Scatterplot of score vs. bty_avg
ggplot(evals, aes(x = bty_avg, y = score)) +
  geom_point()
4.9 Create a scatterplot, with an added layer

Instruction:

# Scatterplot of score vs. bty_avg colored by cls_type
ggplot(evals, aes(x = bty_avg, y = score, color = cls_type)) +
  geom_point()
4.10 Congratulations