欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hive实战系列之用户DAU相关统计

程序员文章站 2022-06-05 09:22:24
...

DAU(Daily Active User)是衡量一一款App的关键指标之一,结合用户留存情况综合评判,App中用户在哪些阶段流失。在数据分析中也是关键的一个环节之一。下面就聊一聊怎么去统计这些指标。

经过数据清洗转换的数据格式如下:

game_name mid_id date_first date_last login_day_count login_count
game1 m1 2021-11-04 2021-11-04 1 5
game2 m2 2021-11-02 2021-11-04 3 9

一,统计当日各个游戏新增用户数量

数据倾斜情况描述:假设新上架一款游戏,数据量很小,但是有多款游戏数据量却很大,这是如果根据game_name分组就会产生数据倾斜。解决方案添加参数如下:

set hive.map.aggr = true;//是否在 Map 端进行聚合,默认为True
set hive.groupby.mapaggr.checkinterval = 100000;// 在 Map 端进行聚合操作的条目数目
set hive.groupby.skewindata = true;//有数据倾斜的时候进行负载均衡(默认是 false)
select
'2021-11-04',
game_name,
count(*)
from dwt_uv_topic
where login_date_first='2021-11-04'
group by game_name;

二,统计各个游戏内日、周、月的活跃用户数,并标记当前日期是否是周末,是否是月末

dt game_name day_count wk_count mn_count is_wk is_month
2021-11-04 game1 123 456 456 N N
主要知识点日期函数:
计算下周一日期:   select next_day("2021-11-04",'MO');              2021-11-08
计算当前日期的周一:select date_add(next_day('2021-11-04','MO'),-7); 2021-11-01
计算当前日期的周末:select date_add(next_day('2021-11-04','MO'),-1); 2021-11-07
计算当前日期的月末:select last_day('2021-11-04'); 2021-11-30
月份统计:date_format('2021-11-05','yyyy-MM')
select
'2021-11-04' dt,
daycount.game_name,
daycount.cnt,
wkcount.cnt,
mncount.cnt,
if(date_add(next_day('2021-11-04','MO'),-1)='2021-11-04','Y','N') ,
if(last_day('2021-11-04')='2021-11-04','Y','N')
from
(
select
'2021-11-04' dt,
game_name,
count(*) cnt
from dwt_uv_topic
where login_date_last='2021-11-04'
group by game_name
)daycount join
(
select
'2021-11-04' dt,
game_name,
count (*) cnt
from dwt_uv_topic
where login_date_last>=date_add(next_day('2021-11-04','MO'),-7)
and login_date_last<= date_add(next_day('$do_date','MO'),-1)
group by game_name;
) wkcount on daycount.dt=wkcount.dt and daycount.game_name=wkcount.game_name
join
(
select
'2021-11-04' dt,
game_name,
count(*) cnt
from dwt_uv_topic_new
where date_format(login_date_last,'yyyy-MM')=date_format('2021-11-04','yyyy-MM')
group by game_name
)mncount on daycount.dt=mncount.dt and daycount.game_name=mncount.game_name;

三,统计用户留存

select
'2021-11-04',
game_name,
date_add('2021-11-04',-1),
1,
sum(if(login_date_first=date_add('2021-11-04',-1) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-1),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-1) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-1),1,0))*100
from dwt_uv_topic
group by game_name

union all

select
'2021-11-04',
game_name,
date_add('2021-11-04',-2),
2,
sum(if(login_date_first=date_add('2021-11-04',-2) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-2),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-2) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-2),1,0))*100
from dwt_uv_topic
group by game_name

union all

select
'2021-11-04',
game_name,
date_add('2021-11-04',-3),
3,
sum(if(login_date_first=date_add('2021-11-04',-3) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-3),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-3) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-3),1,0))*100
from dwt_uv_topic
group by game_name;