欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

hive json数组解析explode,json_tuple

程序员文章站 2024-03-12 17:26:14
...

每天一点点,记录工作中实操可行
hive json数组解析

hive中有字段A长这个样子,想把其中的name值全部解析出来

[{"itemRateId":"73288842","name":"东北有机大米饭","rating":4,"ratingContent":""},{"itemRateId":"73288850","name":"需要餐具","rating":4,"ratingContent":""},{"itemRateId":"73288834","name":"辣跳冒牛百叶","rating":4,"ratingContent":""},{"itemRateId":"73288826","name":"金汤酸菜鱼套餐(鱼+配菜+小菜+米饭)-现炸酥肉(+10元)","rating":4,"ratingContent":""}]

1: get_json_object 每次只能解析一个字段,但我不知道每个字段A到底有多少个name的键值对,不可能挨个解析,所以。。。。。

select  get_json_object(被解析字段A,"$.[0].name")  column_name 
from table_name

结果,只把第一个name值解析出来了

劲爽金汤酸菜鱼小份(米饭另点)-大份

2:先通过explode把原数据变成n行数据,然后再用get_json_object去解析

select  ss.col
from 
(select 
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name  ) pp
lateral view explode(pp.str) ss as col 

输出结果

{"itemRateId":"81980274","name":"劲爽金汤酸菜鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}
{"itemRateId":"81980290","name":"需要餐具","rating":2,"ratingContent":""}
{"itemRateId":"81980266","name":"崇明有机大米饭","rating":2,"ratingContent":""}
{"itemRateId":"81980282","name":"下单前+..领大额红包","rating":2,"ratingContent":""}
{"itemRateId":"81980258","name":"肉麻青花椒鱼小份(米饭另点)-大份","rating":2,"ratingContent":""}

3:用json_tuple 分别解析每行数据

select  rr.itemRateId,rr.name,rr.rating
from (
select 
split(regexp_replace(regexp_extract(被解析字段A,'^\\[(.+)\\]$',1),'\\}\\,\\{', '\\}\\|\\|\\{'),'\\|\\|') as str
from table_name ) pp
lateral view explode(pp.str) ss as col 
lateral view json_tuple(ss.col,'itemRateId','name','rating') rr as itemRateId,name,rating

输出结果,已经将该字段解析成了5行

itemRateId          name                                                rating
81980274	劲爽金汤酸菜鱼小份(米饭另点)-大份	2
81980290	需要餐具	2
81980266	崇明有机大米饭	2
81980282	下单前+..领大额红包	2
81980258	肉麻青花椒鱼小份(米饭另点)-大份	2

4:如果把原来一行的数据,解析成多行,数据量就会变得很大,这个时候,可以再把解析出来的多行字段,合并成一个字段

select 字段1,字段2,
concat_ws(',', collect_set(rr.name)) as z_name
from table_name
group by 字段1,字段2

输出结果:
字段1,字段2,劲爽金汤酸菜鱼小份(米饭另点)-大份,需要餐具,崇明有机大米饭,下单前+…领大额红包,肉麻青花椒鱼小份(米饭另点)

这样,就把原来多数组的json字段,解析,且合并成一个字段啦
有点麻烦,但可以解决部分工作