爬取twitter数据--使用twint
最近因为实验验证的需要,想要爬取推特的数据,首先想到的是通过推特官方的开发者计划拿到key然后直接爬取,连接如下:
twitter myapp
但是问题就在于,这玩意我用两个号申请都被拒绝了,据说是+86的号码被拒就是会很大。我……,想要申请试试的参考下面这个链接
知乎问题,这个下面的评论多看看,注意一些话术,可能会成功。
后来我找到了这个开源的东西,twint,可以直接爬取
https://github.com/twintproject/twint
安装方式就是官方的这个安装方式:
git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt
我是在ubunut20.04下面弄的,这个地方要注意,python的版本必须高于3.5,官方是要求3.6,我用3.8没问题。ubuntu20.04自带的就是python3.8。这个地方踩了大坑,弄了一晚上,ubuntu16.04自带的是3.5.
接下来就是使用了,直接使用twint命令就可以,看下官方的这几个说明:
twint -u username - Scrape all the Tweets of a user (doesn’t include retweets but includes replies).
twint -u username -s pineapple - Scrape all Tweets from the user’s timeline containing pineapple.
twint -s pineapple - Collect every Tweet containing pineapple from everyone’s Tweets.
twint -u username --year 2014 - Collect Tweets that were tweeted before 2014.
twint -u username --since “2015-12-20 20:30:15” - Collect Tweets that were tweeted since 2015-12-20 20:30:15.
twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.
twint -u username -o file.txt - Scrape Tweets and save to file.txt.
twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file.
twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses.
twint -s “Donald Trump” --verified - Display Tweets by verified users that Tweeted about Donald Trump.
twint -g=“48.880048,2.385939,1km” -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
twint -u username -es localhost:9200 - Output Tweets to Elasticsearch
twint -u username -o file.json --json - Scrape Tweets and save as a json file.
twint -u username --database tweets.db - Save Tweets to a SQLite database.
twint -u username --followers - Scrape a Twitter user’s followers.
twint -u username --following - Scrape who a Twitter user follows.
twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
twint -u username --following --user-full - Collect full user information a person follows
twint -u username --timeline - Use an effective method to gather Tweets from a user’s profile (Gathers ~3200 Tweets, including retweets & replies).
twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user’s profile.
twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.
明天给翻译一下,现在下班
上一篇: Mybatis批量更新出错问题