postgresql 中的 like 查询优化方案
程序员文章站
2022-06-19 11:38:24
当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比一、对比情况说明:1、数据量100w条数据2、执行sql二、对比结果explain analyze select...
当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比
一、对比情况说明:
1、数据量100w条数据
2、执行sql
二、对比结果
explain analyze select c_patent, c_applyissno, d_applyissdate, d_applydate, c_patenttype_dimn, c_newlawstatus, c_abstract from public.t_knowl_patent_zlxx_temp where c_applicant like '%本溪满族自治县连山关镇安平安养殖场%';
1、未建索时执行计划:
"gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1) workers planned: 2 workers launched: 2 -> parallel seq scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3) filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text) rows removed by filter: 333333 planning time: 0.272 ms execution time: 228.116 ms"
2、btree索引
建索引语句
create index idx_public_t_knowl_patent_zlxx_applicant on public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
执行计划
"gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1) workers planned: 2 workers launched: 2 -> parallel seq scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3) filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text) rows removed by filter: 333333 planning time: 0.116 ms execution time: 218.189 ms"
但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的
index scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1) index cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场'::text) and ((c_applicant)::text ~<~ '本溪满族自治县连山关镇安平安养殖圻'::text)) filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%'::text) planning time: 0.710 ms execution time: 0.378 ms
3、gin索引
创建索引语句(postgresql要求在9.6版本及以上)
create extension pg_trgm; create index idx_public_t_knowl_patent_zlxx_applicant on public.t_knowl_patent_zlxx using gin (c_applicant gin_trgm_ops);
执行计划
bitmap heap scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1) recheck cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text) -> bitmap index scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1) index cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text) planning time: 0.673 ms execution time: 0.740 ms
三、结论
btree索引可以让后置% "abc%"的模糊匹配走索引,gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端,以下情况可能导致无法命中:
搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。
当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。
补充:postgresql like 查询效率提升实验
一、未做索引的查询效率
作为对比,先对未索引的查询做测试
explain analyze select * from gallery_map where author = '曹志耘'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1) filter: ((author)::text = '曹志耘'::text) rows removed by filter: 71315 planning time: 0.194 ms execution time: 39.879 ms (5 rows) time: 40.599 ms explain analyze select * from gallery_map where author like '曹志耘'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1) filter: ((author)::text ~~ '曹志耘'::text) rows removed by filter: 71315 planning time: 0.188 ms execution time: 41.669 ms (5 rows) time: 42.457 ms explain analyze select * from gallery_map where author like '曹志耘%'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1) filter: ((author)::text ~~ '曹志耘%'::text) rows removed by filter: 71315 planning time: 0.307 ms execution time: 41.633 ms (5 rows) time: 42.676 ms
很显然都会做全表扫描
二、创建btree索引
postgresql默认索引是btree
create index ix_gallery_map_author on gallery_map (author); explain analyze select * from gallery_map where author = '曹志耘'; query plan ------------------------------------------------------------------------------------------------------------------------------------- bitmap heap scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.457..1.312 rows=1031 loops=1) recheck cond: ((author)::text = '曹志耘'::text) heap blocks: exact=438 -> bitmap index scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.358..0.358 rows=1031 loops=1) index cond: ((author)::text = '曹志耘'::text) planning time: 0.416 ms execution time: 1.422 ms (7 rows) time: 2.462 ms explain analyze select * from gallery_map where author like '曹志耘'; query plan ------------------------------------------------------------------------------------------------------------------------------------- bitmap heap scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.752..2.119 rows=1031 loops=1) filter: ((author)::text ~~ '曹志耘'::text) heap blocks: exact=438 -> bitmap index scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.560..0.560 rows=1031 loops=1) index cond: ((author)::text = '曹志耘'::text) planning time: 0.270 ms execution time: 2.295 ms (7 rows) time: 3.444 ms explain analyze select * from gallery_map where author like '曹志耘%'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..41.389 rows=1031 loops=1) filter: ((author)::text ~~ '曹志耘%'::text) rows removed by filter: 71315 planning time: 0.260 ms execution time: 41.518 ms (5 rows) time: 42.430 ms explain analyze select * from gallery_map where author like '%研究室'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=2282 width=621) (actual time=0.064..52.824 rows=2152 loops=1) filter: ((author)::text ~~ '%研究室'::text) rows removed by filter: 70194 planning time: 0.254 ms execution time: 53.064 ms (5 rows) time: 53.954 ms
可以看到,等于、like的全匹配是用到索引的,like的模糊查询还是全表扫描
三、创建gin索引
create extension pg_trgm; create index ix_gallery_map_author on gallery_map using gin (author gin_trgm_ops); explain analyze select * from gallery_map where author like '曹%'; query plan ------------------------------------------------------------------------------------------------------------------------------------- bitmap heap scan on gallery_map (cost=19.96..2705.69 rows=1028 width=621) (actual time=0.419..1.771 rows=1031 loops=1) recheck cond: ((author)::text ~~ '曹%'::text) heap blocks: exact=438 -> bitmap index scan on ix_gallery_map_author (cost=0.00..19.71 rows=1028 width=0) (actual time=0.312..0.312 rows=1031 loops=1) index cond: ((author)::text ~~ '曹%'::text) planning time: 0.358 ms execution time: 1.916 ms (7 rows) time: 2.843 ms explain analyze select * from gallery_map where author like '%耘%'; query plan ----------------------------------------------------------------------------------------------------------------- seq scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..51.641 rows=1031 loops=1) filter: ((author)::text ~~ '%耘%'::text) rows removed by filter: 71315 planning time: 0.268 ms execution time: 51.957 ms (5 rows) time: 52.899 ms explain analyze select * from gallery_map where author like '%研究室%'; query plan ------------------------------------------------------------------------------------------------------------------------------------- bitmap heap scan on gallery_map (cost=31.83..4788.42 rows=2559 width=621) (actual time=0.914..4.195 rows=2402 loops=1) recheck cond: ((author)::text ~~ '%研究室%'::text) heap blocks: exact=868 -> bitmap index scan on ix_gallery_map_author (cost=0.00..31.19 rows=2559 width=0) (actual time=0.694..0.694 rows=2402 loops=1) index cond: ((author)::text ~~ '%研究室%'::text) planning time: 0.306 ms execution time: 4.403 ms (7 rows) time: 5.227 ms
gin_trgm索引的效果好多了
由于pg_trgm的索引是把字符串切成多个3元组,然后使用这些3元组做匹配,所以gin_trgm索引对于少于3个字符(包括汉字)的查询,只有前缀匹配会走索引
另外,还测试了btree_gin,效果和btree一样
注意:
gin_trgm要求数据库必须使用utf-8编码
demo_v1 # \l demo_v1 list of databases name | owner | encoding | collate | ctype | access privileges ---------+-----------+----------+-------------+-------------+------------------- demo_v1 | wmpp_user | utf8 | en_us.utf-8 | en_us.utf-8 |
以上为个人经验,希望能给大家一个参考,也希望大家多多支持。如有错误或未考虑完全的地方,望不吝赐教。
推荐阅读
-
用实例详解Python中的Django框架中prefetch_related()函数对数据库查询的优化
-
Python的Django框架中的select_related函数对QuerySet 查询的优化
-
MySQL中(JOIN/ORDER BY)语句的查询过程及优化方法
-
在mybatis和PostgreSQL Json字段作为查询条件的解决方案
-
海量数据库的查询优化及分页算法方案
-
Android中的SQL查询语句LIKE绑定参数问题解决办法(sqlite数据库)
-
C# ADO.NET中设置Like模糊查询的参数
-
postgresql中integer字段的模糊查询
-
mybatis中的查询所有的几种方案
-
海量数据库的查询优化及分页算法方案集合1/2第1/2页