欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Range Optimization

程序员文章站 2024-03-19 09:46:40
...

The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections describe conditions under which the optimizer uses range access.

范围访问方法使用单个索引检索包含在一个或多个索引值间隔内的表行子集。它可以用于单个部分索引或多个部分索引。以下部分描述优化器使用范围访问的条件。

Range Access Method for Single-Part Indexes

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE clause, denoted as range conditions rather than “intervals.”

对于单部分索引,索引值间隔可以方便地由WHERE子句中的相应条件表示,表示为范围条件,而不是“间隔”。

The definition of a range condition for a single-part index is as follows:

单部分索引的范围条件的定义如下:

For both BTREE and HASH indexes, comparison of a key part with a constant value is a range condition when using the =, <=>, IN(), IS NULL, or IS NOT NULL operators.

对于BTREE和HASH索引,在使用=,<=>,IN()、is NULL或is NOT NULL运算符时,将键部分与常量值进行比较是一个范围条件。

Additionally, for BTREE indexes, comparison of a key part with a constant value is a range condition when using the >, <, >=, <=, BETWEEN, !=, or <> operators, or LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.

另外,对于BTREE索引,使用>,<,>=,<=,BETWEEN!=、或<>运算符,或LIKE比较(如果LIKE的参数是不以通配符开头的常量字符串)。

For all index types, multiple range conditions combined with OR or AND form a range condition.

对于所有索引类型,OR或AND组合的多个范围条件形成一个范围条件。

“Constant value” in the preceding descriptions means one of the following:

前面的描述中的“常数”是指以下之一:

A constant from the query string

A column of a const or system table from the same join

The result of an uncorrelated subquery

Any expression composed entirely from subexpressions of the preceding types

  • 查询字符串中的常量
  • 来自同一联接的常量或系统表的列
  • 不相关子查询的结果
  • 完全由上述类型的子表达式组成的任何表达式

Here are some examples of queries with range conditions in the WHERE clause:

以下是一些在WHERE子句中具有范围条件的查询示例:

SELECT * FROM t1
  WHERE key_col > 1
  AND key_col < 10;

SELECT * FROM t1
  WHERE key_col = 1
  OR key_col IN (15,18,20);

SELECT * FROM t1
  WHERE key_col LIKE 'ab%'
  OR key_col BETWEEN 'bar' AND 'foo';

Some nonconstant values may be converted to constants during the optimizer constant propagation phase.

MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

Consider the following statement, where key1 is an indexed column and nonkey is not indexed:

在优化器常数传播阶段,一些非恒定值可能会转换为常量。

MySQL尝试从WHERE子句中为每个可能的索引提取范围条件。在提取过程中,将删除不能用于构造范围条件的条件,合并产生重叠范围的条件,并删除产生空范围的条件。

考虑下面的语句,其中key1是索引列,而nonkey没有索引:

SELECT * FROM t1 WHERE
  (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
  (key1 < 'bar' AND nonkey = 4) OR
  (key1 < 'uux' AND key1 > 'z');

The extraction process for key key1 is as follows:   Start with original WHERE clause:

键1的提取过程如下:  从原始WHERE子句开始:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z')

Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Replacing them with TRUE yields:

删除nonkey = 4和key1 LIKE'%b',因为它们不能用于范围扫描。 删除它们的正确方法是将它们替换为TRUE,这样在进行范围扫描时我们不会丢失任何匹配的行。 用TRUE替换它们会产生:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
(key1 < 'bar' AND TRUE) OR
(key1 < 'uux' AND key1 > 'z')

Collapse conditions that are always true or false:

(key1 LIKE 'abcde%' OR TRUE) is always true

(key1 < 'uux' AND key1 > 'z') is always false

始终为真或假的塌陷条件: (key1如'abcde%'或TRUE)始终为TRUE  (key1<'uux'和key1>'z')始终为false

Replacing these conditions with constants yields:

用常数代替这些条件可以得到:

(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

Removing unnecessary TRUE and FALSE constants yields:

删除不必要的真和假常量会产生:

(key1 < 'abc') OR (key1 < 'bar')

Combining overlapping intervals into one yields the final condition to be used for the range scan:

将重叠的间隔合并为一个可以得到用于范围扫描的最终条件:

(key1 < 'bar')

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.

一般来说(正如前面的例子所示),用于范围扫描的条件比WHERE子句限制性小。MySQL执行额外的检查,以筛选出满足范围条件但不满足完整WHERE子句的行。

The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in WHERE clause.

范围条件提取算法可以处理任意深度的嵌套和/或构造,其输出不依赖于条件在WHERE子句中出现的顺序。

MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.

对于空间索引的范围访问方法,MySQL不支持合并多个范围。要解决此限制,可以使用具有相同SELECT语句的UNION,但将每个空间谓词放在不同的SELECT中。

Range Access Method for Multiple-Part Indexes

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

多部分索引上的范围条件是单个部分索引范围条件的扩展。多部分索引上的范围条件限制索引行位于一个或多个键元组间隔内。使用索引中的排序,在一组键元组上定义键元组间隔。

For example, consider a multiple-part index defined as key1(key_part1, key_part2, key_part3), and the following set of key tuples listed in key order:

例如,考虑一个定义为key1(key_part1、key_part2、key_part3)的多部分索引,以及按键顺序列出的以下一组键元组:

key_part1  key_part2  key_part3
  NULL       1          'abc'
  NULL       1          'xyz'
  NULL       2          'foo'
   1         1          'abc'
   1         1          'xyz'
   1         2          'abc'
   2         1          'aaa'

the condition key_part1 = 1 defines this interval:

条件key_part1 = 1定义此间隔:

(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)

The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

By contrast, the condition key_part3 = 'abc' does not define a single interval and cannot be used by the range access method.

区间覆盖前面数据集中的第4、第5和第6元组,可由范围访问方法使用。

相比之下,条件键_part3='abc'没有定义单个间隔,并且不能被范围访问方法使用。

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

下面的描述更详细地说明了范围条件如何为多个部分索引工作。

For HASH indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:

对于HASH索引,可以使用包含相同值的每个间隔。 这意味着只能针对以下形式的条件生成间隔:

key_part1 cmp const1
AND key_part2 cmp const2
AND ...
AND key_partN cmp constN;

Here, const1, const2, … are constants, cmp is one of the =, <=>, or IS NULL comparison operators, and the conditions cover all index parts. (That is, there are N conditions, one for each part of an N-part index.) For example, the following is a range condition for a three-part HASH index:

在这里,const1,const2,…是常量cmp是=,<=>或IS NULL比较运算符之一,并且条件覆盖所有索引部分。 (即,有N个条件,一个N部分索引的每个部分一个。)例如,以下是三部分HASH索引的范围条件:

key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'

For the definition of what is considered to be a constant, see Range Access Method for Single-Part Indexes.

有关被视为常量的定义,请参见单部分索引的范围访问方法。

For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

对于BTREE索引,间隔可能适用于与AND组合的条件,其中每个条件使用=,<=>,IS NULL,>,<,> =,<=,!=, <>,BETWEEN或类似“模式”(其中“模式”不以通配符开头)。 只要可以确定包含所有与条件匹配的行的单个键元组,就可以使用一个间隔(如果使用<>或!=,则可以使用两个间隔)。

The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

只要比较运算符为=,<=>或IS NULL,优化器就会尝试使用其他关键部分来确定间隔。 如果运算符是>,<,> =,<=,!=,<>,BETWEEN或LIKE,则优化器将使用它,但不再考虑其他关键部分。 对于以下表达式,优化器使用第一个比较中的=。 它还从第二个比较中使用> =,但不考虑其他关键部分,并且不将第三个比较用于间隔构造:

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

The single interval is:   单个间隔为:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

The intervals are:

(1,-inf) < (key_part1,key_part2) < (1,2)
(5,-inf) < (key_part1,key_part2)

In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.

在此示例中,第一行的间隔使用一个关键部分作为左边界,使用两个关键部分作为右边界。 第二行的间隔仅使用一个关键部分。 EXPLAIN输出中的key_len列指示所使用的**前缀的最大长度。

In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1 and key_part2 can be NULL. Then the key_len column displays two key part lengths for the following condition:

在某些情况下,key_len可能表明已使用了关键部件,但这可能不是您期望的。 假设key_part1和key_part2可以为NULL。 然后,key_len列显示以下条件的两个关键部分长度:

key_part1 >= 1 AND key_part2 < 2

But, in fact, the condition is converted to this:    但是,事实上,条件转化为:

key_part1 >= 1 AND key_part2 IS NOT NULL

 For a description of how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index, see Range Access Method for Single-Part Indexes. Analogous steps are performed for range conditions on multiple-part indexes.

有关如何执行优化以组合或消除单个部分索引上范围条件的间隔的说明,请参见单个部分索引的范围访问方法。对多个部分索引的范围条件执行类似的步骤。

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where col_name is an indexed column:  考虑以下表达式,其中col_name是索引列:

col_name IN(val1, ..., valN)
col_name = val1 OR ... OR col_name = valN

Each expression is true if col_name is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

如果col_name等于多个值中的任何一个,则每个表达式都为true。这些比较是相等范围比较(其中“范围”是单个值)。优化器估计读取符合条件的行以进行相等范围比较的成本,如下所示:

If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.

如果列名称上有唯一索引,则每个范围的行估计值为1,因为最多只能有一行具有给定值。

Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.

否则,列名称上的任何索引都是非唯一的,优化器可以使用深入索引或索引统计信息来估计每个范围的行数。

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

使用索引下潜,优化器在一个范围的每一端进行一次下潜,并使用该范围中的行数作为估计值。例如,(10,20,30)中的表达式col_name有三个相等范围,优化器对每个范围进行两次下移以生成行估计值。每对分解生成具有给定值的行数的估计值。

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

索引剥离提供了精确的行估计值,但是随着表达式中比较值的数量增加,优化器生成行估计值的时间会更长。使用索引统计信息比索引剥离更不精确,但允许对大值列表进行更快的行估计。

The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.

To update table index statistics for best estimates, use ANALYZE TABLE.

eq_range_index_dive_limit系统变量可用于配置优化器从一个行估计策略切换到另一个行估计策略的值的数量。要允许使用索引下潜来比较多达N个相等范围,请将eq_range_index_dive_limit设置为N+1。要禁用统计信息的使用并始终使用索引下潜而不考虑N,请将eq_range_index_dive_limit设置为0。

要更新表索引统计信息以获得最佳估计,请使用ANALYZE table。

Even under conditions when index dives would otherwise be used, they are skipped for queries that satisfy all these conditions:

A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.

The index is nonunique and not a FULLTEXT index.

No subquery is present.

No DISTINCT, GROUP BY, or ORDER BY clause is present.

Those dive-skipping conditions apply only for single-table queries. Index dives are not skipped for multiple-table queries (joins).

即使在使用索引剥离的情况下,满足所有这些条件的查询也会跳过索引剥离:

存在单个索引强制索引索引提示。其想法是,如果强制使用索引,那么执行索引的额外开销将一无所获。

索引不唯一,不是全文索引。

不存在子查询。

不存在DISTINCT、GROUP BY或ORDER BY子句。

这些跳转条件只适用于单表查询。对于多个表查询(联接),不会跳过索引剥离。

Range Optimization of Row Constructor Expressions

The optimizer is able to apply the range scan access method to queries of this form:

SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

Previously, for range scans to be used, it was necessary to write the query as:

SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );

For the optimizer to use a range scan, queries must satisfy these conditions:  要使优化器使用范围扫描,查询必须满足以下条件:

Only IN() predicates are used, not NOT IN().

On the left side of the IN() predicate, the row constructor contains only column references.

On the right side of the IN() predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.

On the right side of the IN() predicate, there is more than one row constructor.

要使优化器使用范围扫描,查询必须满足以下条件:

  • 只使用IN()谓词,不使用NOT IN()。
  • 在IN()谓词的左侧,行构造函数只包含列引用。
  • 在IN()谓词的右侧,行构造函数只包含运行时常量,这些常量要么是文本,要么是在执行期间绑定到常量的本地列引用。
  • 在IN()谓词的右侧,有多个行构造函数。

For more information about the optimizer and row constructors, see Section 8.2.1.19, “Row Constructor Expression Optimization”

有关优化器和行构造函数的更多信息,请参阅第8.2.1.19节“行构造函数表达式优化”

Limiting Memory Use for Range Optimization

To control the memory available to the range optimizer, use the range_optimizer_max_mem_size system variable:

要控制范围优化器的可用内存,请使用range_optimizer_max_mem_size系统变量:

To control the memory available to the range optimizer, use the range_optimizer_max_mem_size system variable:

A value of 0 means “no limit.”

With a value greater than 0, the optimizer tracks the memory consumed when considering the range access method. If the specified limit is about to be exceeded, the range access method is abandoned and other methods, including a full table scan, are considered instead. This could be less optimal. If this happens, the following warning occurs (where N is the current range_optimizer_max_mem_size value):

值为0表示“无限制”

当值大于0时,优化器在考虑范围访问方法时跟踪所消耗的内存。如果即将超过指定的限制,则放弃范围访问方法,而考虑其他方法,包括全表扫描。这可能不太理想。如果发生这种情况,将出现以下警告(其中N是当前range\u optimizer_max_mem_size值):

Warning    3170    Memory capacity of N bytes for
                   'range_optimizer_max_mem_size' exceeded. Range
                   optimization was not done for this query.

For UPDATE and DELETE statements, if the optimizer falls back to a full table scan and the sql_safe_updates system variable is enabled, an error occurs rather than a warning because, in effect, no key is used to determine which rows to modify. For more information, see Using Safe-Updates Mode (--safe-updates).

对于UPDATE和DELETE语句,如果优化器返回到全表扫描,并且启用了sql\U safe U updates系统变量,则会出现错误而不是警告,因为实际上,不会使用键来确定要修改哪些行。有关详细信息,请参见使用安全更新模式(--Safe Updates)。

For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing the range_optimizer_max_mem_size value may improve performance.

对于超出可用范围优化内存的单个查询,并且优化器返回到不太理想的计划,增加range_optimizer_max_mem_size值可以提高性能。

To estimate the amount of memory needed to process a range expression, use these guidelines:

要估计处理范围表达式所需的内存量,请使用以下准则:

For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with OR uses approximately 230 bytes:

对于下面这样的简单查询,如果范围访问方法有一个候选键,则每个谓词组合或使用大约230个字节:

SELECT COUNT(*) FROM t
WHERE a=1 OR a=2 OR a=3 OR .. . a=N;

Similarly for a query such as the following, each predicate combined with AND uses approximately 125 bytes:

类似地,对于下面这样的查询,每个谓词组合并使用大约125个字节:

SELECT COUNT(*) FROM t
WHERE a=1 AND b=1 AND c=1 ... N;

For a query with IN() predicates: 对于具有IN()谓词的查询:

SELECT COUNT(*) FROM t
WHERE a IN (1,2, ..., M) AND b IN (1,2, ..., N);

Each literal value in an IN() list counts as a predicate combined with OR. If there are two IN() lists, the number of predicates combined with OR is the product of the number of literal values in each list. Thus, the number of predicates combined with OR in the preceding case is M × N.

in()列表中的每个文本值都作为与或组合的谓词计数。如果有两个IN()列表,则与OR组合的谓词数量是每个列表中文本值数量的乘积。因此,前一种情况下与OR结合的谓词的数量是M×N。

 Before 5.7.11, the number of bytes per predicate combined with OR was higher, approximately 700 bytes.

在5.7.11之前,与OR结合的每个谓词的字节数更高,大约700字节。

 

相关标签: mysql