mysql笔记 - SELECT 语句
SELECT
[ALL | DISTINCT | DISTINCTROW ] *
[HIGH_PRIORITY]
[STRAIGHT_JOIN]
[SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
[SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
select_expr [, select_expr ...]
[FROM table_references
[WHERE where_condition]
[GROUP BY {col_name | expr | position}
[ASC | DESC], ... [WITH ROLLUP]]
[HAVING where_condition]
[ORDER BY {col_name | expr | position}
[ASC | DESC], ...]
[LIMIT {[offset,] row_count | row_count OFFSET offset}]
[PROCEDURE procedure_name(argument_list)]
[INTO OUTFILE 'file_name'
[CHARACTER SET charset_name]
export_options
| INTO DUMPFILE 'file_name'
| INTO var_name [, var_name]]
[FOR UPDATE | LOCK IN SHARE MODE]]
select * from employees; -- 取出所有数据
desc employees; -- 查看表信息
通常会要求开发人员不要使用select * 即使是select所有字段推荐写上所有的列名,虽然很麻烦但是规范上推荐这样做
一个重要的原因是alter table会对表结构进行修改的很多操作,alter table后select * from取出的字段可能比一开始涉及的要多,可能会遇到各种各样的问题。另外select alter table可以加列,可以加在某个字段中间的位置,那么程序那边取得的数据并不是一开始想要的数据,容易出错。所以建议select的时候建议把所有的列都带上
LIMIT 限制取出来的数据量
不加LIMIT限制会一次取出所有的数据,数据量大的时候可能是灾难性的事情,通常都会加一个LIMIT,图形化工具一般默认会加上一个LIMIT
select * from employees limit 0,3; -- 从第0条开始取出3条数据
select * from employees limit 1,3; -- 从第1条开始取出3条数据
LIMIT常用于分页,但是LIMIT offset数字越大性能越差,为什么呢?
select * from employees limit 30; -- 取30条,扫30条,查询快(0.00 sec)
select * from employees limit 1000000,30; -- 同样取30条,扫1000030行记录,查询变很慢(14.55 sec)
好的做法
select emp_no,birth_date,first_name,last_name,gender from employees limit 10;
+--------+------------+------------+-----------+--------+
| emp_no | birth_date | first_name | last_name | gender |
+--------+------------+------------+-----------+--------+
| 10001 | 1953-09-02 | Georgi | Facello | M |
| 10002 | 1964-06-02 | Bezalel | Simmel | F |
| 10003 | 1959-12-03 | Parto | Bamford | M |
| 10004 | 1954-05-01 | Chirstian | Koblick | M |
| 10005 | 1955-01-21 | Kyoichi | Maliniak | M |
| 10006 | 1953-04-20 | Anneke | Preusig | F |
| 10007 | 1957-05-23 | Tzvetan | Zielinski | F |
| 10008 | 1958-02-19 | Saniya | Kalloufi | M |
| 10009 | 1952-04-19 | Sumant | Peac | F |
| 10010 | 1963-06-01 | Duangkaew | Piveteau | F |
+--------+------------+------------+-----------+--------+
select emp_no,birth_date,first_name,last_name,gender from employees where emp_no > 10010 limit 10;
+--------+------------+------------+-------------+--------+
| emp_no | birth_date | first_name | last_name | gender |
+--------+------------+------------+-------------+--------+
| 10011 | 1953-11-07 | Mary | Sluis | F |
| 10012 | 1960-10-04 | Patricio | Bridgland | M |
| 10013 | 1963-06-07 | Eberhardt | Terkki | M |
| 10014 | 1956-02-12 | Berni | Genin | M |
| 10015 | 1959-08-19 | Guoxiang | Nooteboom | M |
| 10016 | 1961-05-02 | Kazuhito | Cappelletti | M |
| 10017 | 1958-07-06 | Cristinel | Bouloucos | F |
| 10018 | 1954-06-19 | Kazuhide | Peha | F |
| 10019 | 1953-01-23 | Lillian | Haddadi | M |
| 10020 | 1952-12-24 | Mayuko | Warwick | M |
+--------+------------+------------+-------------+--------+
从之前取得数据的最大值开始取数据,这样的一个好处是不管你的值多大,因为是通过大于等于来定位的,所以只是取了10条数据不会再去扫这么多的数据
下面的语句表示从mysql中随机取出3条数据,select语句不带order by表示随机取3条数据
select emp_no,first_name,last_name from employees limit 3;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 10001 | Georgi | Facello |
| 10002 | Bezalel | Simmel |
| 10003 | Parto | Bamford |
+--------+------------+-----------+
数据量小的时候不容易看出来,数据量大的时候容易发现这个问题,或者alter table一下后再试
alter table employees add index idx_name(first_name,last_name);
select emp_no,first_name,last_name from employees limit 3;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 69256 | Aamer | Anger |
| 486584 | Aamer | Armand |
| 237165 | Aamer | Azevdeo |
+--------+------------+-----------+
order by 表示根据哪个字段进行排序,可以按照物理数据顺序取出数据
select emp_no,first_name,last_name from employees order by emp_no limit 3;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 10001 | Georgi | Facello |
| 10002 | Bezalel | Simmel |
| 10003 | Parto | Bamford |
+--------+------------+-----------+
查询分区表
select * from titles partition (p19) limit 1;
+--------+--------------+------------+------------+
| emp_no | title | from_date | to_date |
+--------+--------------+------------+------------+
| 10052 | Senior Staff | 2002-01-31 | 9999-01-01 |
+--------+--------------+------------+------------+
ORDER BY 对取出来的数据做排序
order by根据某个字段进行分页
select emp_no,first_name,last_name from employees order by last_name limit 10;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 11761 | Bartek | Aamodt |
| 15427 | Aluzio | Aamodt |
| 18182 | Dekang | Aamodt |
| 16572 | Matt | Aamodt |
| 12791 | Mokhtar | Aamodt |
| 12516 | Sreenivas | Aamodt |
| 12982 | Sachem | Aamodt |
| 17400 | Basim | Aamodt |
| 19898 | Vidar | Aamodt |
| 17885 | Takanari | Aamodt |
+--------+------------+-----------+
WHERE 过滤数据
select emp_no,first_name,last_name from employees where emp_no = 10001;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 10001 | Georgi | Facello |
+--------+------------+-----------+
select emp_no,first_name,last_name from employees where emp_no >= 10001 limit 10;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 10001 | Georgi | Facello |
| 10002 | Bezalel | Simmel |
| 10003 | Parto | Bamford |
| 10004 | Chirstian | Koblick |
| 10005 | Kyoichi | Maliniak |
| 10006 | Anneke | Preusig |
| 10007 | Tzvetan | Zielinski |
| 10008 | Saniya | Kalloufi |
| 10009 | Sumant | Peac |
| 10010 | Duangkaew | Piveteau |
+--------+------------+-----------+
select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like 'Am%' limit 10;
+--------+------------+---------------+
| emp_no | first_name | last_name |
+--------+------------+---------------+
| 100860 | Amabile | Aamodt |
| 285669 | Amabile | Akiyama |
| 276002 | Amabile | Albarhamtoshy |
| 454340 | Amabile | Alencar |
| 86625 | Amabile | Anger |
| 416143 | Amabile | Antonisse |
| 491486 | Amabile | Antonisse |
| 451988 | Amabile | Apsitis |
| 409363 | Amabile | Atchley |
| 208844 | Amabile | Baar |
+--------+------------+---------------+
select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like '%Am%' limit 10;
+--------+------------+--------------+
| emp_no | first_name | last_name |
+--------+------------+--------------+
| 20044 | Kiam | Gist |
| 20062 | Uinam | Heuser |
| 20114 | Ramalingam | Zyda |
| 20118 | Mohammed | Schneeberger |
| 20142 | Arumugam | Emmart |
| 20159 | Isamu | Valiente |
| 20167 | Stamatina | Kobara |
| 20217 | Tamiya | Ruemmler |
| 20265 | Amalendu | Willoner |
| 20285 | Pramod | Escriba |
+--------+------------+--------------+
select emp_no,first_name,last_name from employees where emp_no >= 20001 or first_name like '%Am%' limit 10;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 69256 | Aamer | Anger |
| 486584 | Aamer | Armand |
| 237165 | Aamer | Azevdeo |
| 413688 | Aamer | Azuma |
| 281363 | Aamer | Baak |
| 242368 | Aamer | Baaleh |
| 206549 | Aamer | Baar |
| 259089 | Aamer | Baba |
| 60922 | Aamer | Bahl |
| 283280 | Aamer | Bahl |
+--------+------------+-----------+
where 1 = 1表示没有任何条件全部成立,因为不知道第一个where应该怎么写,如果提前把where 1 = 1写好,这样程序那边拼接后面的条件就比较方便
select emp_no,first_name,last_name from employees where 1 = 1 limit 10;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 69256 | Aamer | Anger |
| 486584 | Aamer | Armand |
| 237165 | Aamer | Azevdeo |
| 413688 | Aamer | Azuma |
| 281363 | Aamer | Baak |
| 242368 | Aamer | Baaleh |
| 206549 | Aamer | Baar |
| 259089 | Aamer | Baba |
| 60922 | Aamer | Bahl |
| 283280 | Aamer | Bahl |
+--------+------------+-----------+
比如
select emp_no,first_name,last_name from employees where 1 = 1 and emp_no = 20000;
+--------+------------+-----------+
| emp_no | first_name | last_name |
+--------+------------+-----------+
| 20000 | Jenwei | Matzke |
+--------+------------+-----------+
GROUP BY
分组就是根据某个字段当中相同的值来进行分组,并且分组完之后是要加一个聚集(统计)函数
use test;
create table t (a int auto_increment primary key, b int, c int);
insert into t select NULL,1,1;
insert into t select NULL,1,7;
insert into t select NULL,1,8;
insert into t select NULL,2,8;
insert into t select NULL,2,10;
insert into t select NULL,3,8;
select * from t;
+---+------+------+
| a | b | c |
+---+------+------+
| 1 | 1 | 1 |
| 2 | 1 | 7 |
| 3 | 1 | 8 |
| 4 | 2 | 8 |
| 5 | 2 | 10 |
| 6 | 3 | 8 |
+---+------+------+
select b,sum(b),avg(b) from t group by b;
+------+--------+--------+
| b | sum(b) | avg(b) |
+------+--------+--------+
| 1 | 3 | 1.0000 |
| 2 | 4 | 2.0000 |
| 3 | 3 | 3.0000 |
+------+--------+--------+
HAVING
HAVING表示对group by中的聚合函数进行过滤
select b,sum(b),avg(b) from t where avb(b)>2 group by b; -- 这样是会报错的
ERROR 1305 (42000): FUNCTION test.avb does not exist
分组中需要对分组的条件进行过滤需要使用HAVING
select b,sum(b),avg(b) from t group by b having avg(b) > 2;
+------+--------+--------+
| b | sum(b) | avg(b) |
+------+--------+--------+
| 3 | 3 | 3.0000 |
+------+--------+--------+
不是说使用了group by就代表不能使用where了,这样也是可以使用where进行过滤的,只不过where过滤的是非聚合的结果(对查询的记录进行过滤),而having是用来过滤聚合的结果
select b,sum(b),avg(b) from t where b < 3 group by b;
+------+--------+--------+
| b | sum(b) | avg(b) |
+------+--------+--------+
| 1 | 3 | 1.0000 |
| 2 | 4 | 2.0000 |
+------+--------+--------+
双重过滤
select b,sum(b),avg(b) from t where b < 3 group by b having avg(b) < 2;
+------+--------+--------+
| b | sum(b) | avg(b) |
+------+--------+--------+
| 1 | 3 | 1.0000 |
+------+--------+--------+
所以where是一开始选数据的时候就开始过滤了,having是在聚集结果出来之后才开始过滤
having也可以过滤指定的一列而不是聚集函数,但是这样写没有什么意义,使用where可以获得更好的性能,所以having一般都跟聚集函数搭配使用
select b,sum(b),avg(b) from t where b < 3 group by b having b < 2;
+------+--------+--------+
| b | sum(b) | avg(b) |
+------+--------+--------+
| 1 | 3 | 1.0000 |
+------+--------+--------+
select字段中没有选择的字段,having中使用会报错
select b,sum(b),avg(b) from t where b < 3 group by b having a < 2;
ERROR 1054 (42S22): Unknown column 'a' in 'having clause'
这样的语句没有意义,因为取出来的a字段数据不一定是146,mysql会随机取a字段的数据,而且这种写法在orcal中好像不支持
select a,b,sum(b) from t group by b;
+---+------+--------+
| a | b | sum(b) |
+---+------+--------+
| 1 | 1 | 3 |
| 4 | 2 | 4 |
| 6 | 3 | 3 |
+---+------+--------+
JOIN
基本的多表关联查询
SELECT
concat(first_name,' ',last_name), dept_name -- concat连接了2个字段的字符串
FROM
employees e, -- e是别名,下面3个雷同
dept_emp de,
departments d
WHERE
e.emp_no = de.emp_no -- 关联条件
AND de.dept_no = d.dept_no -- 再一次关联条件
LIMIT 3;
+----------------------------------+------------------+
| concat(first_name,' ',last_name) | dept_name |
+----------------------------------+------------------+
| Mary Sluis | Customer Service |
| Huan Lortz | Customer Service |
| Basil Tramer | Customer Service |
+----------------------------------+------------------+
新建一些例子
create table a (a int);
create table b (a int);
insert into a select 1;
insert into a select 2;
insert into a select 3;
insert into b select 1;
insert into b select 2;
select * from a;select * from b;
+------+
| a |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)
+------+
| a |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.03 sec)
最简单的两表关联,并不是2张表关联就会做笛卡尔积,如果关联的时候没有写where关联条件就会产生笛卡尔积
select * from a,b where a.a = b.a;
+------+------+
| a | a |
+------+------+
| 1 | 1 |
| 2 | 2 |
+------+------+
select * from a,b; -- 这样会产生笛卡尔积
+------+------+
| a | a |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+------+------+
三张表只有1个过滤条件会出现图示效果,会产生笛卡尔积的效果,出现了4条记录
create table c (a int);
insert into c select 2;
insert into c select 3;
select * from a,b,c where a.a = b.a; -- 这样会产生笛卡尔积
+------+------+------+
| a | a | a |
+------+------+------+
| 1 | 1 | 2 |
| 2 | 2 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+------+------+------+
还可以这样写,关联条件可以非等值
select * from a,b where a.a < b.a;
+------+------+
| a | a |
+------+------+
| 1 | 2 |
+------+------+
修改一下字段名先。。。
alter table a change a x int;
alter table b change a y int;
alter table c change a z int;
INNER JOIN
这3个写法没有区别
select * from a inner join b on a.x = b.y;
select * from a join b on a.x = b.y;
select * from a,b where a.x = b.y;
-- 都返回
+------+------+
| x | y |
+------+------+
| 1 | 1 |
| 2 | 2 |
+------+------+
以下语句的关联,通常来说会选择关联过滤度高的条件进行关联,这里b.y > 1的关联度会高一些。
select * from a,b where a.x = b.y and b.y > 1;
+------+------+
| x | y |
+------+------+
| 2 | 2 |
+------+------+
假设a表100w记录b表100w记录,其中的x,y一一对应,优化机通畅都会先从b>100 and b<200条件先进行过滤,这样就只需要100w和100条记录进行join就可以了,如果先从a.x = b.y进行过滤,那就是100w和100w的记录进行join,所以一般优化机是会把b>100 and b<200的过滤条件设置得更高
INNER JOIN的时候过滤条件即可写在on里面也可以写在where条件里面,对结果来说是没有区别的
select * from a inner join b on (a.x = b.y and b.y > 1);
select * from a inner join b on a.x = b.y where b.y > 1;
都返回
+------+------+
| x | y |
+------+------+
| 2 | 2 |
+------+------+
OUTER JOIN
outer join 分成left/right两种,outer可以关键词可以省略,left/right join代表左/右外连接
左连接中左边的表为保留表,保留表中所有的字段都是要出现的,如果关联条件存在的话就是一对一的情况,右表中关联条件不存在则关联出来的结果就是NULL值
select * from a left join b on a.x = b.y;
+------+------+
| x | y |
+------+------+
| 1 | 1 |
| 2 | 2 |
| 3 | NULL |
+------+------+
select * from a right join b on a.x = b.y;
+------+------+
| x | y |
+------+------+
| 1 | 1 |
| 2 | 2 |
+------+------+
a表中存在的但是b表中不存在的数据,左连接实现
select * from a left join b on a.x = b.y where b.y is NULL;
+------+------+
| x | y |
+------+------+
| 3 | NULL |
+------+------+
a表中但是不在b表中,不使用left join 使用not in + 子查询,不过会存在一些性能上的问题
select * from a where a.x not in (select y from b);
+------+
| x |
+------+
| 3 |
+------+
这样的查询b.y会全部返回NULL,因为a.x = b.y and b.y is NULL不成立,所以b.y全部用NULL填充
select * from a left join b on a.x = b.y and b.y is NULL;
+------+------+
| x | y |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
过滤条件写在on后还是where后是有讲究的,推荐使用on来进行两张表之间的关联,where用来进行数据的过滤。
外连接中where不能写在on前面,内连接则可以,left join是用到比较多的一个方式
多次left join
select * from a left join b on a.x = b.y left join c on a.x = c.z;
+------+------+------+
| x | y | z |
+------+------+------+
| 2 | 2 | 2 |
| 1 | 1 | NULL |
| 3 | NULL | 3 |
+------+------+------+
select * from a left join b on a.x = b.y left join c on b.y = c.z order by a.x;
+------+------+------+
| x | y | z |
+------+------+------+
| 1 | 1 | NULL |
| 2 | 2 | 2 |
| 3 | NULL | NULL |
+------+------+------+
第二次join使用等值连接尝试
select * from a left join b on a.x = b.y inner join c on b.y = c.z order by a.x;
+------+------+------+
| x | y | z |
+------+------+------+
| 2 | 2 | 2 |
+------+------+------+
一些例子
查询不是经理的员工
SELECT
e.emp_no
FROM
employees e
LEFT JOIN
dept_manager dm ON e.emp_no = dm.emp_no
WHERE
dm.emp_no IS NULL
LIMIT 3;
分页技巧
使用where而不是使用limit 10000,10 因为limit会扫10000前面的1w条数据
select * from employees where emp_no > 10000 limit 10;
根据生日分页
alter table employees add index idx_birth(birth_date,emp_no); -- 建立联合索引
select * from employees where birth_date > '1952-02-02' order by birth_date limit 10;
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-----------+--------+------------+
| 16093 | 1952-02-03 | Luise | Tramer | M | 1992-02-28 |
| 16447 | 1952-02-03 | Zhiguo | Savasere | F | 1987-04-28 |
| 23857 | 1952-02-03 | Kristen | Frijda | M | 1986-05-31 |
| 27259 | 1952-02-03 | Michaela | Lipner | M | 1997-09-05 |
| 32094 | 1952-02-03 | Mohd | Buchter | F | 1986-03-16 |
| 32641 | 1952-02-03 | Tua | Tetzlaff | F | 1988-12-01 |
| 41374 | 1952-02-03 | JiYoung | Schurmann | M | 1988-01-25 |
| 43737 | 1952-02-03 | Debatosh | Beerel | F | 1987-12-07 |
| 56225 | 1952-02-03 | Miquel | Rusmann | F | 1990-06-17 |
| 59577 | 1952-02-03 | Adel | Swiler | F | 1986-11-27 |
+--------+------------+------------+-----------+--------+------------+
-- 错误的结果,日期直接跳到了02-03
select * from employees where (birth_date,emp_no) > ('1952-02-02', 10010) order by birth_date limit 10;
+--------+------------+------------+-------------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date |
+--------+------------+------------+-------------+--------+------------+
| 12282 | 1952-02-02 | Tadahiro | Delgrange | M | 1997-01-09 |
| 13944 | 1952-02-02 | Takahito | Maierhofer | M | 1989-01-18 |
| 22614 | 1952-02-02 | Dung | Madeira | M | 1989-01-24 |
| 29456 | 1952-02-02 | Barun | Krohm | F | 1992-11-23 |
| 33131 | 1952-02-02 | Reinhold | Savasere | M | 1998-01-30 |
| 40660 | 1952-02-02 | Piyush | Erbe | F | 1988-04-04 |
| 48910 | 1952-02-02 | Zhongwei | DuBourdieux | M | 1999-12-19 |
| 51486 | 1952-02-02 | Jianwen | Sigstam | F | 1989-07-20 |
| 59884 | 1952-02-02 | Fan | Przulj | M | 1991-09-25 |
| 61382 | 1952-02-02 | Kristof | Ranft | M | 1989-04-21 |
+--------+------------+------------+-------------+--------+------------+
-- 正确的结果
select o_orderkey,o_orderstatus,o_totalprice from orders where(o_orderdate,o_orderkey) > ('1992-01-01',88199) order by o_orderdate limit 10;
select o_orderkey,o_orderstatus,o_totalprice from orders order by o_orderdate limit 10,10;
-- 结果一致
当然分页推荐使用redis来做,用数据库实现其实意义不大