Left Join

程序员文章站 2023-12-28 11:31:58
...
开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行 []left join] 的，我们来看看到底
开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。
语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行[]left join]的，我们来看看到底什么是真正的[left join]。
[gpadmin@mdw ~]$ psql bigdatagp  
  
psql (8.2.15)  
  
Type "help" for help.  
  
  
  
bigdatagp=# drop table tgt1;  
  
DROP TABLE  
  
bigdatagp=# drop table tgt2;  
  
DROP TABLE  
  
bigdatagp=# explain  select t1.telnumber,t2.ua,t2.url,t1.apply_name,t2.apply_name from gpbase.tb_csv_gn_ip_session t1 ,gpbase.tb_csv_gn_http_session_hw t2 where  t1.bigdatagp=# \q                                                                                                                                                       bigdatagp=# create table tgt1(id int, name varchar(20));                                                                                                             NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
  
CREATE TABLE  
  
bigdatagp=# create table tgt2(id int, name varchar(20));   
  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
  
CREATE TABLE  
  
bigdatagp=# insert into tgt1 select generate_series(1,3),('a','b');  
  
ERROR:  column "name" is of type character varying but expression is of type record  
  
HINT:  You will need to rewrite or cast the expression.  
  
bigdatagp=# insert into tgt1 select generate_series(1,5),generate_series(1,5)||'a';  
  
INSERT 0 5  
  
bigdatagp=# insert into tgt2 select generate_series(1,2),generate_series(1,2)||'a';      
  
INSERT 0 2  
  
bigdatagp=# select * from tgt1;  
  
 id | name   
  
----+------  
  
  2 | 2a  
  
  4 | 4a  
  
  1 | 1a  
  
  3 | 3a  
  
  5 | 5a  
  
(5 rows)  
  
  
  
bigdatagp=# select * from tgt1 order by id;  
  
 id | name   
  
----+------  
  
  1 | 1a  
  
  2 | 2a  
  
  3 | 3a  
  
  4 | 4a  
  
  5 | 5a  
  
(5 rows)  
  
  
  
bigdatagp=# select * from tgt2 order by id;   
  
 id | name   
  
----+------  
  
  1 | 1a  
  
  2 | 2a  
  
(2 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id;  
  
 id | name | id | name   
  
----+------+----+------  
  
  3 | 3a   |    |   
  
  5 | 5a   |    |   
  
  1 | 1a   |  1 | 1a  
  
  2 | 2a   |  2 | 2a  
  
  4 | 4a   |    |   
  
(5 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id order by a.id;  
  
 id | name | id | name   
  
----+------+----+------  
  
  1 | 1a   |  1 | 1a  
  
  2 | 2a   |  2 | 2a  
  
  3 | 3a   |    |   
  
  4 | 4a   |    |   
  
  5 | 5a   |    |   
  
(5 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where id>=3 order by a.id;  
  
ERROR:  column reference "id" is ambiguous  
  
LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id>=3 orde...  
  
                                                             ^  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;  
  
 id | name | id | name   
  
----+------+----+------  
  
  3 | 3a   |    |   
  
  4 | 4a   |    |   
  
  5 | 5a   |    |   
  
(3 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;          
  
 id | name | id | name   
  
----+------+----+------  
  
  1 | 1a   |    |   
  
  2 | 2a   |    |   
  
  3 | 3a   |    |   
  
  4 | 4a   |    |   
  
  5 | 5a   |    |   
  
(5 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;   
  
 id | name | id | name   
  
----+------+----+------  
  
(0 rows)  
  
  
  
bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;       
  
 id | name | id | name   
  
----+------+----+------  
  
  1 | 1a   |    |   
  
  2 | 2a   |    |   
  
  3 | 3a   |    |   
  
  4 | 4a   |    |   
  
  5 | 5a   |    |   
  
(5 rows)  
  
  
  
bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id;  
  
                                                                    QUERY PLAN                                                                       
  
---------------------------------------------------------------------------------------------------------------------------------------------------  
  
 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.18..7.19 rows=1 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  3 rows at destination with 21 ms to end, start offset by 559 ms.  
  
   ->  Sort  (cost=7.18..7.19 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 5.452 ms to first row, 5.454 ms to end, start offset by 564 ms.  
  
         Executor memory:  63K bytes avg, 74K bytes max (seg2).  
  
         Work_mem used:  63K bytes avg, 74K bytes max (seg2). Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 4.190 ms to first row, 4.598 ms to end, start offset by 565 ms.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)  
  
                     Filter: id >= 3  
  
                     Rows out:  Avg 1.0 rows x 3 workers.  Max 1 rows (seg52) with 0.156 ms to first row, 0.158 ms to end, start offset by 565 ms.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 332K bytes.  
  
   (slice1)    Executor memory: 446K bytes avg x 64 workers, 4329K bytes max (seg52).  Work_mem: 74K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 580.630 ms  
  
(24 rows)  
  
  
  
bigdatagp=# explain analyze  select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id;   
  
                                                                       QUERY PLAN                                                                          
  
---------------------------------------------------------------------------------------------------------------------------------------------------------  
  
 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  5 rows at destination with 24 ms to end, start offset by 701 ms.  
  
   ->  Sort  (cost=7.23..7.24 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.292 ms to first row, 6.294 ms to end, start offset by 715 ms.  
  
         Executor memory:  70K bytes avg, 74K bytes max (seg0).  
  
         Work_mem used:  70K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Join Filter: a.id >= 3  
  
               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.422 ms to first row, 5.055 ms to end, start offset by 717 ms.  
  
               Executor memory:  1K bytes avg, 1K bytes max (seg42).  
  
               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)  
  
               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)  
  
                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.179 ms to first row, 0.180 ms to end, start offset by 717 ms.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.194 ms to end, start offset by 721 ms.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.143 ms to first row, 0.145 ms to end, start offset by 721 ms.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 332K bytes.  
  
   (slice1)    Executor memory: 581K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 725.316 ms  
  
(27 rows)  
  
  
  
bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id;    
  
                                                  QUERY PLAN                                                    
  
--------------------------------------------------------------------------------------------------------------  
  
 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.17..7.18 rows=1 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  (No row requested) 0 rows at destination with 6.536 ms to end, start offset by 1.097 ms.  
  
   ->  Sort  (cost=7.17..7.18 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
         Executor memory:  33K bytes avg, 33K bytes max (seg0).  
  
         Work_mem used:  33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.15 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)  
  
                     Filter: id >= 6  
  
                     Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Rows out:  (No row requested) 0 rows (seg0) with 0 ms to end.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 332K bytes.  
  
   (slice1)    Executor memory: 225K bytes avg x 64 workers, 225K bytes max (seg0).  Work_mem: 33K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 8.615 ms  
  
(24 rows)  
  
  
  
bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id;          
  
                                                                       QUERY PLAN                                                                         
  
--------------------------------------------------------------------------------------------------------------------------------------------------------  
  
 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  5 rows at destination with 115 ms to end, start offset by 1.195 ms.  
  
   ->  Sort  (cost=7.23..7.24 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 6.979 ms to first row, 6.980 ms to end, start offset by 12 ms.  
  
         Executor memory:  72K bytes avg, 74K bytes max (seg0).  
  
         Work_mem used:  72K bytes avg, 74K bytes max (seg0). Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Join Filter: a.id >= 6  
  
               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.570 ms to first row, 6.157 ms to end, start offset by 12 ms.  
  
               Executor memory:  1K bytes avg, 1K bytes max (seg42).  
  
               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)  
  
               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)  
  
                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.050 ms to first row, 0.051 ms to end, start offset by 12 ms.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.153 ms to end, start offset by 18 ms.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.133 ms to first row, 0.135 ms to end, start offset by 18 ms.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 332K bytes.  
  
   (slice1)    Executor memory: 583K bytes avg x 64 workers, 4353K bytes max (seg42).  Work_mem: 74K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 116.997 ms  
  
(27 rows)  
  
  
  
bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where id=6 order by a.id;  
  
ERROR:  column reference "id" is ambiguous  
  
LINE 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id=6 order...  
  
                                                             ^  
  
bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id=6 order by a.id;  
  
                                             QUERY PLAN                                                
  
-----------------------------------------------------------------------------------------------------  
  
 Gather Motion 1:1  (slice1; segments: 1)  (cost=7.17..7.18 rows=4 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  (No row requested) 0 rows at destination with 3.212 ms to end, start offset by 339 ms.  
  
   ->  Sort  (cost=7.17..7.18 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  (No row requested) 0 rows with 0 ms to end.  
  
         Executor memory:  58K bytes.  
  
         Work_mem used:  58K bytes. Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.14 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Rows out:  (No row requested) 0 rows with 0 ms to end.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.06 rows=1 width=7)  
  
                     Filter: id = 6  
  
                     Rows out:  (No row requested) 0 rows with 0 ms to end.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  (No row requested) 0 rows with 0 ms to end.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Filter: id = 6  
  
                           Rows out:  (No row requested) 0 rows with 0 ms to end.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 252K bytes.  
  
   (slice1)    Executor memory: 251K bytes (seg3).  Work_mem: 58K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 342.067 ms  
  
(25 rows)  
  
  
  
bigdatagp=#  explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id;        
  
                                                                       QUERY PLAN                                                                         
  
--------------------------------------------------------------------------------------------------------------------------------------------------------  
  
 Gather Motion 64:1  (slice1; segments: 64)  (cost=7.23..7.24 rows=1 width=14)  
  
   Merge Key: "?column5?"  
  
   Rows out:  5 rows at destination with 435 ms to end, start offset by 1.130 ms.  
  
   ->  Sort  (cost=7.23..7.24 rows=1 width=14)  
  
         Sort Key: a.id  
  
         Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 5.156 ms to first row, 5.158 ms to end, start offset by 7.597 ms.  
  
         Executor memory:  58K bytes avg, 58K bytes max (seg0).  
  
         Work_mem used:  58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused)  
  
         ->  Hash Left Join  (cost=2.04..7.17 rows=1 width=14)  
  
               Hash Cond: a.id = b.id  
  
               Join Filter: a.id = 6  
  
               Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 4.155 ms to first row, 4.813 ms to end, start offset by 7.930 ms.  
  
               Executor memory:  1K bytes avg, 1K bytes max (seg42).  
  
               Work_mem used:  1K bytes avg, 1K bytes max (seg42). Workfile: (0 spilling, 0 reused)  
  
               (seg42)  Hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets.  
  
               ->  Seq Scan on tgt1 a  (cost=0.00..5.05 rows=1 width=7)  
  
                     Rows out:  Avg 1.0 rows x 5 workers.  Max 1 rows (seg42) with 0.126 ms to first row, 0.127 ms to end, start offset by 7.941 ms.  
  
               ->  Hash  (cost=2.02..2.02 rows=1 width=7)  
  
                     Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.103 ms to end, start offset by 12 ms.  
  
                     ->  Seq Scan on tgt2 b  (cost=0.00..2.02 rows=1 width=7)  
  
                           Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg42) with 0.074 ms to first row, 0.076 ms to end, start offset by 12 ms.  
  
 Slice statistics:  
  
   (slice0)    Executor memory: 332K bytes.  
  
   (slice1)    Executor memory: 569K bytes avg x 64 workers, 4337K bytes max (seg42).  Work_mem: 58K bytes max.  
  
 Statement statistics:  
  
   Memory used: 128000K bytes  
  
 Total runtime: 436.384 ms  
  
(27 rows)
因此如果要对a表过滤需要把条件写在where里面，要对b表过滤需要把调教写在b表的子查询里面，至于[ON]只是用来控制显示的。
-EOF-
相关标签： Left Join 开发有个语句行了超过 2个小时
Left Join

HBase二级索引与Join

MySQL 中 Join 的基本实现原理

是多个Ajax请求更消耗服务端性能，还是服务端Mysql Join更消耗性能？

MySQL 8.0.18 Hash Join不支持left/right join左右连接问题

Oracle执行计划中的连接方式nested loops join、sort merge joinn、hash join

MySQL 8.0 新特性之哈希连接（Hash Join）

MySQL查询语句执行过程及性能优化-查询过程及优化方法（JOIN/ORD_MySQL

关于MySql 和SqlServer 中left join ， full join的一点区别

Java并发：join与wait

Python常见字符串操作函数小结【split()、join()、strip()】