Scaling Big Data Mining Infrastructure at Twitter

程序员文章站 2022-06-11 15:34:18

...

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two thing

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

Your boss says something vague
You think very hard on how to move the needle
Where’s the data?
What’s in this dataset?
What’s all the f#$#$ crap in the data?
Clean the data
Run some off-the-shelf data mining algorithm
…
Productionize, act on the insight
Rinse, repeat

Enjoy!

Scaling Big Data Mining Infrastructure Twitter Experience

Original title and link: Scaling Big Data Mining Infrastructure at Twitter (NoSQL database?myNoSQL)

原文地址：Scaling Big Data Mining Infrastructure at Twitter, 感谢原作者分享。

上一篇：火狐另存为的网页不能正常显示_html/css_WEB-ITnose

下一篇： mysql简单配置_MySQL