欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Text-Mining-DataCamp-Analyzing Social Media Data in R

程序员文章站 2024-01-30 20:17:40
...

Text-Mining-DataCamp-Analyzing Social Media Data in R

1. Understanding Twitter Data

1.1 Analyzing twitter data (video)
1.2 Power of twitter data

Instruction:

# Extract live tweets for 120 seconds window
tweets120s <- stream_tweets("", timeout = 120)

# View dimensions of the data frame with live tweets
dim(tweets120s)
1.3 Pros and cons of twitter data
1.4 Extracting twitter data (video)
1.5 Prerequisites to set up the R environment
1.6 Search and extract tweets

Instruction:

# Extract tweets on "#Emmyawards" and include retweets
twts_emmy <- search_tweets("#Emmyawards", 
                 n = 2000, 
                 include_rts = TRUE, 
                 lang = "en")

# View output for the first 5 columns and 10 rows
head(twts_emmy[,1:5], 10)
1.7 Search and extract timelines

Instruction:

# Extract tweets posted by the user @Cristiano
get_cris <- get_timeline("@Cristiano", n = 3200)

# View output for the first 5 columns and 10 rows
head(get_cris[,1:5], 10)
1.8 Components of twitter data (video)
1.9 User interest and tweet counts

Instruction:

# Create a table of users and tweet counts for the topic
sc_name <- table(tweets_ai$screen_name)

# Sort the table in descending order of tweet counts
sc_name_sort <- sort(sc_name, decreasing = TRUE)

# View sorted table for top 10 users
head(sc_name_sort, 10)
1.10 Compare follower count

Instruction:

# Extract user data for the twitter accounts of 4 news sites
users <- lookup_users("nytimes", "CNN", "FoxNews", "NBCNews")

# Create a data frame of screen names and follower counts
user_df <- users[,c("screen_name","followers_count")]

# Display and compare the follower counts for the 4 news sites
user_df
1.11 Retweet counts

Instruction 1:

# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)

# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))

Instruction 2:

# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)

# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))

# Exclude rows with duplicate text from sorted data frame
rtwt_unique <- unique(rtwt_sort, by = "text")

# Print top 6 unique posts retweeted most number of times
rownames(rtwt_unique) <- NULL
head(rtwt_unique)

2. Analyzing Twitter Data

2.1 Filtering tweets (video)
2.2 Filtering for original tweets

Instruction:

# Extract 100 original tweets on "Superbowl"
tweets_org <- search_tweets("Superbowl -filter:retweets -filter:quote -filter:replies", n = 100)

# Check for presence of replies
count(tweets_org$reply_to_screen_name)

# Check for presence of quotes
count(tweets_org$is_quote)

# Check for presence of retweets
count(tweets_org$is_retweet)
2.3 Filtering on tweet language

Instruction:

在这里插入代码片
2.4 Filter based on tweet popularity

Instruction:

在这里插入代码片
2.5 Twitter user analysis

Instruction:

在这里插入代码片
2.6 Extract user information

Instruction:

在这里插入代码片
2.7 Explore users based on the golden ratio

Instruction:

在这里插入代码片
2.8 Subscribers to twitter lists

Instruction:

在这里插入代码片
2.9 Twitter trends

Instruction:

在这里插入代码片
2.10 Available trends

Instruction:

在这里插入代码片
2.11 Trends by country name

Instruction:

在这里插入代码片
2.12 Trends by city and most tweeted trends

Instruction:

在这里插入代码片
2.13 Plotting twitter data over time

Instruction:

在这里插入代码片
2.14 Visualizing frequency of tweets

Instruction:

在这里插入代码片
2.15 Create time series objects

Instruction:

在这里插入代码片
2.16 Compare tweet frequencies for two brands

Instruction:

在这里插入代码片

3. Visualize Tweet Texts

3.1 Processing twitter text
3.2 Remove URLs and characters other than letters
3.3 Build a corpus and convert to lowercase
3.4 Remove stop words and additional spaces
3.5 Visualize popular terms
3.6 Removing custom stop words
3.7 Visualize popular terms with bar plots
3.8 Word clouds for visualization
3.9 Topic modeling of tweets
3.10 The LDA algorithm
3.11 Create a document term matrix
3.12 Create a topic model
3.13 Twitter sentiment analysis
3.14 Extract sentiment scores
3.15 Perform sentiment analysis

4. Network Analysis and Putting Twitter Data on the Map

4.1 Twitter network analysis
4.2 Preparing data for a retweet network
4.3 Create a retweet network
4.4 Network centrality measures
4.5 Calculate out-degree scores
4.6 Compute the in-degree scores
4.7 Calculate the betweenness scores
4.8 Visualizing twitter networks
4.9 Create a network plot with attributes
4.10 Network plot based on centrality measure
4.11 Follower count to enhance the network plot
4.12 Putting twitter data on the map
4.13 Extract geolocation coordinates
4.14 Twitter data on the map
4.15 Course wrap-up
相关标签: DataCamp-R-Text-Mining