Hive实战系列之用户DAU相关统计
程序员文章站
2022-06-05 09:22:24
...
DAU(Daily Active User)是衡量一一款App的关键指标之一,结合用户留存情况综合评判,App中用户在哪些阶段流失。在数据分析中也是关键的一个环节之一。下面就聊一聊怎么去统计这些指标。
经过数据清洗转换的数据格式如下:
game_name | mid_id | date_first | date_last | login_day_count | login_count |
game1 | m1 | 2021-11-04 | 2021-11-04 | 1 | 5 |
game2 | m2 | 2021-11-02 | 2021-11-04 | 3 | 9 |
一,统计当日各个游戏新增用户数量
数据倾斜情况描述:假设新上架一款游戏,数据量很小,但是有多款游戏数据量却很大,这是如果根据game_name分组就会产生数据倾斜。解决方案添加参数如下:
set hive.map.aggr = true;//是否在 Map 端进行聚合,默认为True set hive.groupby.mapaggr.checkinterval = 100000;// 在 Map 端进行聚合操作的条目数目 set hive.groupby.skewindata = true;//有数据倾斜的时候进行负载均衡(默认是 false)
select
'2021-11-04',
game_name,
count(*)
from dwt_uv_topic
where login_date_first='2021-11-04'
group by game_name;
二,统计各个游戏内日、周、月的活跃用户数,并标记当前日期是否是周末,是否是月末
dt | game_name | day_count | wk_count | mn_count | is_wk | is_month |
2021-11-04 | game1 | 123 | 456 | 456 | N | N |
主要知识点日期函数: 计算下周一日期: select next_day("2021-11-04",'MO'); 2021-11-08 计算当前日期的周一:select date_add(next_day('2021-11-04','MO'),-7); 2021-11-01 计算当前日期的周末:select date_add(next_day('2021-11-04','MO'),-1); 2021-11-07 计算当前日期的月末:select last_day('2021-11-04'); 2021-11-30 月份统计:date_format('2021-11-05','yyyy-MM')
select
'2021-11-04' dt,
daycount.game_name,
daycount.cnt,
wkcount.cnt,
mncount.cnt,
if(date_add(next_day('2021-11-04','MO'),-1)='2021-11-04','Y','N') ,
if(last_day('2021-11-04')='2021-11-04','Y','N')
from
(
select
'2021-11-04' dt,
game_name,
count(*) cnt
from dwt_uv_topic
where login_date_last='2021-11-04'
group by game_name
)daycount join
(
select
'2021-11-04' dt,
game_name,
count (*) cnt
from dwt_uv_topic
where login_date_last>=date_add(next_day('2021-11-04','MO'),-7)
and login_date_last<= date_add(next_day('$do_date','MO'),-1)
group by game_name;
) wkcount on daycount.dt=wkcount.dt and daycount.game_name=wkcount.game_name
join
(
select
'2021-11-04' dt,
game_name,
count(*) cnt
from dwt_uv_topic_new
where date_format(login_date_last,'yyyy-MM')=date_format('2021-11-04','yyyy-MM')
group by game_name
)mncount on daycount.dt=mncount.dt and daycount.game_name=mncount.game_name;
三,统计用户留存
select
'2021-11-04',
game_name,
date_add('2021-11-04',-1),
1,
sum(if(login_date_first=date_add('2021-11-04',-1) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-1),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-1) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-1),1,0))*100
from dwt_uv_topic
group by game_name
union all
select
'2021-11-04',
game_name,
date_add('2021-11-04',-2),
2,
sum(if(login_date_first=date_add('2021-11-04',-2) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-2),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-2) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-2),1,0))*100
from dwt_uv_topic
group by game_name
union all
select
'2021-11-04',
game_name,
date_add('2021-11-04',-3),
3,
sum(if(login_date_first=date_add('2021-11-04',-3) and login_date_last='2021-11-04',1,0)),
sum(if(login_date_first=date_add('2021-11-04',-3),1,0)),
sum(if(login_date_first=date_add('2021-11-04',-3) and login_date_last='2021-11-04',1,0))/sum(if(login_date_first=date_add('2021-11-04',-3),1,0))*100
from dwt_uv_topic
group by game_name;
上一篇: php之aop实践
下一篇: Oracle 中 decode 函数用法
推荐阅读