sql 遇到多个重复列名报错:Ambiguous column reference ***
程序员文章站
2022-07-15 12:23:22
...
1.执行下段sql报错,原因是a表和b表都有user_log_acct,item_third_cate_cd,它不会自动合并一列的!!只会有两列都叫这个,故c表里有两列user_log_acct,没办法和d表匹配上!
SELECT
*
FROM
(
SELECT
*
from
(
select
*
from
表1
where
dt = '2018-05-30'
and tp = '13252'
and pv_15 > 1
)a
LEFT JOIN
(
select
*
from
表2
where
dt = '2018-05-30'
and item_third_cate_cd = '13252'
and order_7 > 0
)b
ON
(
a.user_log_acct = b.user_log_acct
and a.item_third_cate_cd = b.item_third_cate_cd
)
WHERE
b.user_log_acct is NULL
)c
LEFT JOIN
(
select
user_log_acct,
item_third_cate_cd,
1 as label
from
表3
where
dt = '2018-06-04'
and item_third_cate_cd = '13252'
and order_5 > 0
)d
ON
(
c.user_log_acct = d.user_log_acct
and c.cateId = d.item_third_cate_cd
)
where
d.user_log_acct is not NULL
and d.item_third_cate_cd is not null
2.后来尝试在a表取别名,发现也不行,因为这意味有三列user_log_acct,只不过有一列叫pin,其他两列还是重名的!这个问题就不太好解决,除非你把2000多个特征(除了user_log_acct)都列出来
(
select
a.user_log_acct as pin,a.*,b.*
from
表1
where
dt = '2018-05-30'
and tp = '13252'
and pv_15 > 1
)a
3.最后解决:在python中写spark sql,在a和b关联后,用drop函数去除b表中重复列