sql 遇到多个重复列名报错：Ambiguous column reference ***

程序员文章站 2022-07-15 12:23:22

...

1.执行下段sql报错，原因是a表和b表都有user_log_acct，item_third_cate_cd，它不会自动合并一列的！！只会有两列都叫这个，故c表里有两列user_log_acct，没办法和d表匹配上！

SELECT
    *
FROM
    (
        SELECT
            *
        
        from
            (
                select
                    *
                from
                    表1
                where
                    dt = '2018-05-30'
                    and tp = '13252'
                    and pv_15 > 1
            )a
        LEFT  JOIN
            (
                select
                    *
                from
                    表2
                where
                    dt = '2018-05-30'
                    and item_third_cate_cd = '13252'
                    and order_7 > 0
            )b
        ON
            (
                a.user_log_acct =  b.user_log_acct
                and a.item_third_cate_cd =  b.item_third_cate_cd
               
            )
        WHERE
            b.user_log_acct is NULL
    )c
LEFT  JOIN
    (
        select
            user_log_acct,
            item_third_cate_cd,
            1 as label
        from
            表3
        where
            dt = '2018-06-04'
            and item_third_cate_cd = '13252'
            and order_5 > 0
    )d
ON
    (
        c.user_log_acct =  d.user_log_acct
        and c.cateId =  d.item_third_cate_cd
    )
where
   d.user_log_acct is not NULL
and d.item_third_cate_cd is not null

2.后来尝试在a表取别名，发现也不行，因为这意味有三列user_log_acct，只不过有一列叫pin，其他两列还是重名的！这个问题就不太好解决，除非你把2000多个特征（除了user_log_acct)都列出来

(
                select
                    a.user_log_acct as pin,a.*,b.*
                from
                    表1
                where
                    dt = '2018-05-30'
                    and tp = '13252'
                    and pv_15 > 1
            )a

3.最后解决：在python中写spark sql，在a和b关联后，用drop函数去除b表中重复列

上一篇： LeetCode 11题盛最多水的容器

下一篇： Effective C++ 条款02 尽量以const,enum,inline替换#define