How to design the push and pull of News Feed service for twitter/facebook/ig/RSS system?
we need to know what are the key points of the news feed system?
follow and unfollow
the new feed for everyone is different.
我们有两种模型来构建news feed: pull(主动) push(被动)
Pull model:
when user looking for his news feed, the system should get the latest 100 tweets of each of his friends and combined to 100 news feed(based on the timestamps)
algorithm: getNewFeed+ Merged K sort arrays.
getNewsFeed(request)
• followings = DB.getFollowings(user=request.user) //get the following
• news_feed = empty
• for follow in followings: //iterate the following
• tweets = DB.getTweets(follow.to_user, 100) //get the tweets of each one of them
• news_feed.merge(tweets) //merge them
• sort(news_feed) //merge k sort arrays them by time stamps
• return news_feed
time complexity analysis: suppose we have N following, so there will be the time of N DB reads+K merge time
the principle of Pull:
Push Model:
push happens when there are new tweets are being posted.
algorithm:
time complexity analysis:
• postTweet(request, tweet_info): //异步执行
• tweet = DB.insertTweet(request.user, tweet_info)
• AsyncService.fanoutTweet(request.user, tweet)
• return success
• AsyncService::fanoutTweet(user, tweet)
• followers = DB.getFollowers(user)
• for follower in followers:
• DB.insertNewsFeed(tweet, follower)
the principle of push:
如何选择pull还是push呢?
根据存在即合理原则,每一种都有实际应用:(主动型占优)
facebook: pull
ig: push+pull
twitter: pull
可以话说回来 如何选择呢?每一种都可以,但是要针对每一个的缺点要有解决方法 这个再系统设计4S中的最后一步Scale中解决。
具体请参见本博主的《System Design Divide and Conquer, use Twitter Design as an example》
特殊的情况:
- 明星用户,适用push model的话 fanout的整个过程可能需要几个小时,既然Push不行 那么就换成pull? 可是哪有那么简单啊 想换就换!
正确的思路:首先尝试在现有模型下做最小改动来进行优化 比如加几台用于做Push任务的机器。或者对长期的增长进行评估来判断是否值得转换整个模型。
比较好的答案:针对普通用户 我们只Push 但是针对明星用户,我们不主动给push给所有的follower,而是当其关注着需要的时候 来到明星用户的timeline里面取i,并合并到newsFeed里面。
2.明星用户 但是其粉丝数量出现摇摆 出现减小或者别的情况
仍然Push
下面总结一下常见的什么时候用pull/push
push: 资源少,实时性要求低 用户发帖数量少 双向好友关系(即没有明星,比如朋友圈)
pull:资源充足 实时性要求高 用户大量发帖 有明星问题
下一篇: 2019年大前端年度总结