Text-Mining-DataCamp-Analyzing Social Media Data in R
程序员文章站
2024-01-30 20:17:40
...
Text-Mining-DataCamp-Analyzing Social Media Data in R
1. Understanding Twitter Data
1.1 Analyzing twitter data (video)
1.2 Power of twitter data
Instruction:
# Extract live tweets for 120 seconds window
tweets120s <- stream_tweets("", timeout = 120)
# View dimensions of the data frame with live tweets
dim(tweets120s)
1.3 Pros and cons of twitter data
1.4 Extracting twitter data (video)
1.5 Prerequisites to set up the R environment
1.6 Search and extract tweets
Instruction:
# Extract tweets on "#Emmyawards" and include retweets
twts_emmy <- search_tweets("#Emmyawards",
n = 2000,
include_rts = TRUE,
lang = "en")
# View output for the first 5 columns and 10 rows
head(twts_emmy[,1:5], 10)
1.7 Search and extract timelines
Instruction:
# Extract tweets posted by the user @Cristiano
get_cris <- get_timeline("@Cristiano", n = 3200)
# View output for the first 5 columns and 10 rows
head(get_cris[,1:5], 10)
1.8 Components of twitter data (video)
1.9 User interest and tweet counts
Instruction:
# Create a table of users and tweet counts for the topic
sc_name <- table(tweets_ai$screen_name)
# Sort the table in descending order of tweet counts
sc_name_sort <- sort(sc_name, decreasing = TRUE)
# View sorted table for top 10 users
head(sc_name_sort, 10)
1.10 Compare follower count
Instruction:
# Extract user data for the twitter accounts of 4 news sites
users <- lookup_users("nytimes", "CNN", "FoxNews", "NBCNews")
# Create a data frame of screen names and follower counts
user_df <- users[,c("screen_name","followers_count")]
# Display and compare the follower counts for the 4 news sites
user_df
1.11 Retweet counts
Instruction 1:
# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)
# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))
Instruction 2:
# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)
# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))
# Exclude rows with duplicate text from sorted data frame
rtwt_unique <- unique(rtwt_sort, by = "text")
# Print top 6 unique posts retweeted most number of times
rownames(rtwt_unique) <- NULL
head(rtwt_unique)
2. Analyzing Twitter Data
2.1 Filtering tweets (video)
2.2 Filtering for original tweets
Instruction:
# Extract 100 original tweets on "Superbowl"
tweets_org <- search_tweets("Superbowl -filter:retweets -filter:quote -filter:replies", n = 100)
# Check for presence of replies
count(tweets_org$reply_to_screen_name)
# Check for presence of quotes
count(tweets_org$is_quote)
# Check for presence of retweets
count(tweets_org$is_retweet)
2.3 Filtering on tweet language
Instruction:
在这里插入代码片
2.4 Filter based on tweet popularity
Instruction:
在这里插入代码片
2.5 Twitter user analysis
Instruction:
在这里插入代码片
2.6 Extract user information
Instruction:
在这里插入代码片
2.7 Explore users based on the golden ratio
Instruction:
在这里插入代码片
2.8 Subscribers to twitter lists
Instruction:
在这里插入代码片
2.9 Twitter trends
Instruction:
在这里插入代码片
2.10 Available trends
Instruction:
在这里插入代码片
2.11 Trends by country name
Instruction:
在这里插入代码片
2.12 Trends by city and most tweeted trends
Instruction:
在这里插入代码片
2.13 Plotting twitter data over time
Instruction:
在这里插入代码片
2.14 Visualizing frequency of tweets
Instruction:
在这里插入代码片
2.15 Create time series objects
Instruction:
在这里插入代码片
2.16 Compare tweet frequencies for two brands
Instruction:
在这里插入代码片
3. Visualize Tweet Texts
3.1 Processing twitter text
3.2 Remove URLs and characters other than letters
3.3 Build a corpus and convert to lowercase
3.4 Remove stop words and additional spaces
3.5 Visualize popular terms
3.6 Removing custom stop words
3.7 Visualize popular terms with bar plots
3.8 Word clouds for visualization
3.9 Topic modeling of tweets
3.10 The LDA algorithm
3.11 Create a document term matrix
3.12 Create a topic model
3.13 Twitter sentiment analysis
3.14 Extract sentiment scores
3.15 Perform sentiment analysis
4. Network Analysis and Putting Twitter Data on the Map
4.1 Twitter network analysis
4.2 Preparing data for a retweet network
4.3 Create a retweet network
4.4 Network centrality measures
4.5 Calculate out-degree scores
4.6 Compute the in-degree scores
4.7 Calculate the betweenness scores
4.8 Visualizing twitter networks
4.9 Create a network plot with attributes
4.10 Network plot based on centrality measure
4.11 Follower count to enhance the network plot
4.12 Putting twitter data on the map
4.13 Extract geolocation coordinates
4.14 Twitter data on the map
4.15 Course wrap-up
上一篇: list循环遍历