InnoDB On-Disk Structures--Tables (转载）

程序员文章站 2022-07-05 08:34:56

转载、节选于https://dev.mysql.com/doc/refman/8.0/en/innodb-tables.html 1.InnoDB Architecture The following diagram shows in-memory and on-disk structures th ......

转载、节选于

1.innodb architecture

the following diagram shows in-memory and on-disk structures that comprise the innodb storage engine architecture.

InnoDB On-Disk Structures--Tables (转载）

this section covers topics related to `innodb` tables.

2 creating innodb tables

you do not need to specify the engine=innodb clause if innodb is defined as the default storage engine, which it is by default.

an innodb table and its indexes can be created in the system tablespace, in a file-per-table tablespace, or in a general tablespace. when innodb_file_per_table is enabled, which is the default, an innodb table is implicitly created in an individual file-per-table tablespace. conversely, when innodb_file_per_table is disabled, an innodb table is implicitly created in the innodb system tablespace. to create a table in a general tablespace, use create table ... tablespace syntax.

when you create a table in a file-per-table tablespace, mysql creates an .ibd tablespace file in a database directory under the mysql data directory, by default. a table created in the innodb system tablespace is created in an existing ibdata file, which resides in the mysql data directory. a table created in a general tablespace is created in an existing general tablespace .ibd file. general tablespace files can be created inside or outside of the mysql data directory.

internally, innodb adds an entry for each table to the data dictionary. the entry includes the database name. for example, if table t1 is created in the test database, the data dictionary entry for the database name is 'test/t1'. this means you can create a table of the same name (t1) in a different database, and the table names do not collide inside innodb.

innodb tables and row formats

the default row format for innodb tables is defined by the innodb_default_row_format configuration option, which has a default value of dynamic. dynamic andcompressed row format allow you to take advantage of innodb features such as table compression and efficient off-page storage of long column values. to use these row formats, innodb_file_per_table must be enabled (the default).

innodb tables and primary keys

always define a primary key for an innodb table, specifying the column or columns that:

are referenced by the most important queries.
are never left blank.
never have duplicate values.
rarely if ever change value once inserted.

for example, in a table containing information about people, you would not create a primary key on (firstname, lastname) because more than one person can have the same name, some people have blank last names, and sometimes people change their names. with so many constraints, often there is not an obvious set of columns to use as a primary key, so you create a new column with a numeric id to serve as all or part of the primary key. you can declare an auto-increment column so that ascending values are filled in automatically as rows are inserted。

# the value of id can act like a pointer between related items in different tables.
create table t5 (id int auto_increment, b char (20), primary key (id));

# the primary key can consist of more than one column. any autoinc column must come first.
create table t6 (id int auto_increment, a int, b char (20), primary key (id,a));

although the table works correctly without defining a primary key, the primary key is involved with many aspects of performance and is a crucial design aspect for any large or frequently used table. it is recommended that you always specify a primary key in the create table statement. if you create the table, load data, and then run alter table to add a primary key later, that operation is much slower than defining the primary key when creating the table.

viewing innodb table properties

to view the properties of an innodb table, issue a show table status statement:

mysql> show table status from test like 't%' \g;
*************************** 1. row ***************************
           name: t1
         engine: innodb
        version: 10
     row_format: compact
           rows: 0
 avg_row_length: 0
    data_length: 16384
max_data_length: 0
   index_length: 0
      data_free: 0
 auto_increment: null
    create_time: 2015-03-16 15:13:31
    update_time: null
     check_time: null
      collation: utf8mb4_0900_ai_ci
       checksum: null
 create_options:
        comment:

innodb table properties may also be queried using the innodb information schema system tables:

mysql> select * from information_schema.innodb_tables where name='test/t1' \g
*************************** 1. row ***************************
     table_id: 45
         name: test/t1
         flag: 1
       n_cols: 5
        space: 35
   row_format: compact
zip_page_size: 0
   space_type: single

3.moving or copying innodb tables

this section describes techniques for moving or copying some or all innodb tables to a different server or instance. for example, you might move an entire mysql instance to a larger, faster server; you might clone an entire mysql instance to a new replication slave server; you might copy individual tables to another instance to develop and test an application, or to a data warehouse server to produce reports.

on windows, innodb always stores database and table names internally in lowercase. to move databases in a binary format from unix to windows or from windows to unix, create all databases and tables using lowercase names. a convenient way to accomplish this is to add the following line to the [mysqld] section of your my.cnf or my.ini file before creating any databases or tables:

[mysqld]
lower_case_table_names=1

注意：it is prohibited to start the server with a lower_case_table_names setting that is different from the setting used when the server was initialized.

techniques for moving or copying innodb tables include:

transportable tablespaces

the transportable tablespaces feature uses flush tables ... for export to ready innodb tables for copying from one server instance to another. to use this feature, innodb tables must be created with innodb_file_per_table set to on so that each innodb table has its own tablespace.

mysql enterprise backup

the mysql enterprise backup product lets you back up a running mysql database with minimal disruption to operations while producing a consistent snapshot of the database. when mysql enterprise backup is copying tables, reads and writes can continue. in addition, mysql enterprise backup can create compressed backup files, and back up subsets of tables. in conjunction with the mysql binary log, you can perform point-in-time recovery. mysql enterprise backup is included as part of the mysql enterprise subscription.

copying data files (cold backup method)

innodb data and log files are binary-compatible on all platforms having the same floating-point number format. if the floating-point formats differ but you have not usedfloat or double data types in your tables, then the procedure is the same: simply copy the relevant files.

when you move or copy file-per-table .ibd files, the database directory name must be the same on the source and destination systems. the table definition stored in theinnodb shared tablespace includes the database name. the transaction ids and log sequence numbers stored in the tablespace files also differ between databases.

export and import (mysqldump)

you can use mysqldump to dump your tables on one machine and then import the dump files on the other machine. using this method, it does not matter whether the formats differ or if your tables contain floating-point data.

one way to increase the performance of this method is to switch off autocommit mode when importing data, assuming that the tablespace has enough space for the big rollback segment that the import transactions generate. do the commit only after importing a whole table or a segment of a table.

4.converting tables from myisam to innodb

if you have myisam tables that you want to convert to innodb for better reliability and scalability, review the following guidelines and tips before converting.

adjusting memory usage for myisam and innodb

as you transition away from myisam tables, lower the value of the key_buffer_size configuration option to free memory no longer needed for caching results. increase the value of the innodb_buffer_pool_size configuration option, which performs a similar role of allocating cache memory for innodb tables. the innodb buffer poolcaches both table data and index data, speeding up lookups for queries and keeping query results in memory for reuse.

handling too-long or too-short transactions

because myisam tables do not support transactions, you might not have paid much attention to the autocommit configuration option and the commit and rollbackstatements. these keywords are important to allow multiple sessions to read and write innodb tables concurrently, providing substantial scalability benefits in write-heavy workloads.

while a transaction is open, the system keeps a snapshot of the data as seen at the beginning of the transaction, which can cause substantial overhead if the system inserts, updates, and deletes millions of rows while a stray transaction keeps running. thus, take care to avoid transactions that run for too long:

if you are using a mysql session for interactive experiments, always commit (to finalize the changes) or rollback (to undo the changes) when finished. close down interactive sessions rather than leave them open for long periods, to avoid keeping transactions open for long periods by accident.
make sure that any error handlers in your application also rollback incomplete changes or commit completed changes.
rollback is a relatively expensive operation, because insert, update, and delete operations are written to innodb tables prior to the commit, with the expectation that most changes are committed successfully and rollbacks are rare. when experimenting with large volumes of data, avoid making changes to large numbers of rows and then rolling back those changes.
when loading large volumes of data with a sequence of insert statements, periodically commit the results to avoid having transactions that last for hours. in typical load operations for data warehousing, if something goes wrong, you truncate the table (using truncate table) and start over from the beginning rather than doing arollback.

the preceding tips save memory and disk space that can be wasted during too-long transactions. when transactions are shorter than they should be, the problem is excessive i/o. with each commit, mysql makes sure each change is safely recorded to disk, which involves some i/o.

for most operations on innodb tables, you should use the setting autocommit=0. from an efficiency perspective, this avoids unnecessary i/o when you issue large numbers of consecutive insert, update, or delete statements. from a safety perspective, this allows you to issue a rollback statement to recover lost or garbled data if you make a mistake on the mysql command line, or in an exception handler in your application.
the time when autocommit=1 is suitable for innodb tables is when running a sequence of queries for generating reports or analyzing statistics. in this situation, there is no i/o penalty related to commit or rollback, and innodb can automatically optimize the read-only workload.
if you make a series of related changes, finalize all the changes at once with a single commit at the end. for example, if you insert related pieces of information into several tables, do a single commit after making all the changes. or if you run many consecutive insert statements, do a single commit after all the data is loaded; if you are doing millions of insert statements, perhaps split up the huge transaction by issuing a commit every ten thousand or hundred thousand records, so the transaction does not grow too large.
remember that even a select statement opens a transaction, so after running some report or debugging queries in an interactive mysql session, either issue a commit or close the mysql session.

handling deadlocks

you might see warning messages referring to “deadlocks” in the mysql error log, or the output of show engine innodb status. despite the scary-sounding name, adeadlock is not a serious issue for innodb tables, and often does not require any corrective action. when two transactions start modifying multiple tables, accessing the tables in a different order, they can reach a state where each transaction is waiting for the other and neither can proceed. when deadlock detection is enabled (the default), mysql immediately detects this condition and cancels (rolls back) the “smaller” transaction, allowing the other to proceed. if deadlock detection is disabled using theinnodb_deadlock_detect configuration option, innodb relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock.

either way, your applications need error-handling logic to restart a transaction that is forcibly cancelled due to a deadlock. when you re-issue the same sql statements as before, the original timing issue no longer applies. either the other transaction has already finished and yours can proceed, or the other transaction is still in progress and your transaction waits until it finishes.

if deadlock warnings occur constantly, you might review the application code to reorder the sql operations in a consistent way, or to shorten the transactions. you can test with the innodb_print_all_deadlocks option enabled to see all deadlock warnings in the mysql error log, rather than only the last warning in the show engine innodb status output.

planning the storage layout

to get the best performance from innodb tables, you can adjust a number of parameters related to storage layout.

when you convert myisam tables that are large, frequently accessed, and hold vital data, investigate and consider the innodb_file_per_table and innodb_page_sizeconfiguration options, and the row_format and key_block_size clauses of the create table statement.

during your initial experiments, the most important setting is innodb_file_per_table. when this setting is enabled, which is the default, new innodb tables are implicitly created in file-per-table tablespaces. in contrast with the innodb system tablespace, file-per-table tablespaces allow disk space to be reclaimed by the operating system when a table is truncated or dropped. file-per-table tablespaces also support dynamic and compressed row formats and associated features such as table compression, efficient off-page storage for long variable-length columns, and large index prefixes.

you can also store innodb tables in a shared general tablespace, which support multiple tables and all row formats.

converting an existing table

to convert a non-innodb table to use innodb use alter table:

alter table table_name engine=innodb;

cloning the structure of a table

you might make an innodb table that is a clone of a myisam table, rather than using alter table to perform conversion, to test the old and new table side-by-side before switching.

create an empty innodb table with identical column and index definitions. use show create table table_name\g to see the full create table statement to use. change the engine clause to engine=innodb.

transferring existing data

to transfer a large volume of data into an empty innodb table created as shown in the previous section, insert the rows with insert into innodb_table select * frommyisam_table order by primary_key_columns.

you can also create the indexes for the innodb table after inserting the data. historically, creating new secondary indexes was a slow operation for innodb, but now you can create the indexes after the data is loaded with relatively little overhead from the index creation step.

if you have unique constraints on secondary keys, you can speed up a table import by turning off the uniqueness checks temporarily during the import operation:

set unique_checks=0;
... import operation ...
set unique_checks=1;

for big tables, this saves disk i/o because innodb can use its change buffer to write secondary index records as a batch. be certain that the data contains no duplicate keys.unique_checks permits but does not require storage engines to ignore duplicate keys.

for better control over the insertion process, you can insert big tables in pieces:

insert into newtable select * from oldtable
   where yourkey > something and yourkey <= somethingelse;

after all records are inserted, you can rename the tables.

during the conversion of big tables, increase the size of the innodb buffer pool to reduce disk i/o, to a maximum of 80% of physical memory. you can also increase the size of innodb log files.

storage requirements

if you intend to make several temporary copies of your data in innodb tables during the conversion process, it is recommended that you create the tables in file-per-table tablespaces so that you can reclaim the disk space when you drop the tables. when the innodb_file_per_table configuration option is enabled (the default), newly created innodb tables are implicitly created in file-per-table tablespaces.

whether you convert the myisam table directly or create a cloned innodb table, make sure that you have sufficient disk space to hold both the old and new tables during the process. innodb tables require more disk space than myisam tables. if an alter table operation runs out of space, it starts a rollback, and that can take hours if it is disk-bound. for inserts, innodb uses the insert buffer to merge secondary index records to indexes in batches. that saves a lot of disk i/o. for rollback, no such mechanism is used, and the rollback can take 30 times longer than the insertion.

in the case of a runaway rollback, if you do not have valuable data in your database, it may be advisable to kill the database process rather than wait for millions of disk i/o operations to complete.

defining a primary key for each table

the primary key clause is a critical factor affecting the performance of mysql queries and the space usage for tables and indexes. the primary key uniquely identifies a row in a table. every row in the table must have a primary key value, and no two rows can have the same primary key value.

these are guidelines for the primary key, followed by more detailed explanations.

declare a primary key for each table. typically, it is the most important column that you refer to in where clauses when looking up a single row.
declare the primary key clause in the original create table statement, rather than adding it later through an alter table statement.
choose the column and its data type carefully. prefer numeric columns over character or string ones.
consider using an auto-increment column if there is not another stable, unique, non-null, numeric column to use.
an auto-increment column is also a good choice if there is any doubt whether the value of the primary key column could ever change. changing the value of a primary key column is an expensive operation, possibly involving rearranging data within the table and within each secondary index.

consider adding a primary key to any table that does not already have one. use the smallest practical numeric type based on the maximum projected size of the table. this can make each row slightly more compact, which can yield substantial space savings for large tables. the space savings are multiplied if the table has any secondary indexes, because the primary key value is repeated in each secondary index entry. in addition to reducing data size on disk, a small primary key also lets more data fit into the buffer pool, speeding up all kinds of operations and improving concurrency.

if the table already has a primary key on some longer column, such as a varchar, consider adding a new unsigned auto_increment column and switching the primary key to that, even if that column is not referenced in queries. this design change can produce substantial space savings in the secondary indexes. you can designate the former primary key columns as unique not null to enforce the same constraints as the primary key clause, that is, to prevent duplicate or null values across all those columns.

if you spread related information across multiple tables, typically each table uses the same column for its primary key. for example, a personnel database might have several tables, each with a primary key of employee number. a sales database might have some tables with a primary key of customer number, and other tables with a primary key of order number. because lookups using the primary key are very fast, you can construct efficient join queries for such tables.

if you leave the primary key clause out entirely, mysql creates an invisible one for you. it is a 6-byte value that might be longer than you need, thus wasting space. because it is hidden, you cannot refer to it in queries.

application performance considerations

the reliability and scalability features of innodb require more disk storage than equivalent myisam tables. you might change the column and index definitions slightly, for better space utilization, reduced i/o and memory consumption when processing result sets, and better query optimization plans making efficient use of index lookups.

if you do set up a numeric id column for the primary key, use that value to cross-reference with related values in any other tables, particularly for join queries. for example, rather than accepting a country name as input and doing queries searching for the same name, do one lookup to determine the country id, then do other queries (or a single join query) to look up relevant information across several tables. rather than storing a customer or catalog item number as a string of digits, potentially using up several bytes, convert it to a numeric id for storing and querying. a 4-byte unsigned int column can index over 4 billion items (with the us meaning of billion: 1000 million).

understanding files associated with innodb tables

innodb files require more care and planning than myisam files do.you must not delete the ibdata files that represent the innodb system tablespace.

5.auto_increment handling in innodb

innodb provides a configurable locking mechanism that can significantly improve scalability and performance of sql statements that add rows to tables with auto_increment columns. to use the auto_increment mechanism with an innodb table, an auto_increment column must be defined as part of an index such that it is possible to perform the equivalent of an indexed select max(ai_col) lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.

innodb provides a configurable locking mechanism that can significantly improve scalability and performance of sql statements that add rows to tables withauto_increment columns. to use the auto_increment mechanism with an innodb table, an auto_increment column must be defined as part of an index such that it is possible to perform the equivalent of an indexed select max(ai_col) lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.

this section describes the behavior of auto_increment lock modes, usage implications for different auto_increment lock mode settings, and how innodb initializes theauto_increment counter.

innodb auto_increment lock modes

this section describes the behavior of auto_increment lock modes used to generate auto-increment values, and how each lock mode affects replication. auto-increment lock modes are configured at startup using the innodb_autoinc_lock_mode configuration parameter.

the following terms are used in describing innodb_autoinc_lock_mode settings:

“insert-like” statements

all statements that generate new rows in a table, including insert, insert ... select, replace, replace ... select, and load data. includes “simple-inserts”,“bulk-inserts”, and “mixed-mode” inserts.
“simple inserts”

statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). this includes single-row and multiple-row insert and replace statements that do not have a nested subquery, but not insert ... on duplicate key update.
“bulk inserts”

statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. this includes insert ... select, replace ... select, and load data statements, but not plain insert. innodb assigns new values for the auto_increment column one at a time as each row is processed.
“mixed-mode inserts”

these are “simple insert” statements that specify the auto-increment value for some (but not all) of the new rows. an example follows, where c1 is an auto_incrementcolumn of table t1:

insert into t1 (c1,c2) values (1,'a'), (null,'b'), (5,'c'), (null,'d');

InnoDB On-Disk Structures--Tables (转载）

another type of “mixed-mode insert” is insert ... on duplicate key update, which in the worst case is in effect an insert followed by a update, where the allocated value for the auto_increment column may or may not be used during the update phase.

there are three possible settings for the innodb_autoinc_lock_mode configuration parameter. the settings are 0, 1, or 2, for “traditional”, “consecutive”, or “interleaved”lock mode, respectively. as of mysql 8.0, interleaved lock mode (innodb_autoinc_lock_mode=2) is the default setting. prior to mysql 8.0, consecutive lock mode is the default (innodb_autoinc_lock_mode=1).

the default setting of interleaved lock mode in mysql 8.0 reflects the change from statement-based replication to row based replication as the default replication type. statement-based replication requires the consecutive auto-increment lock mode to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of sql statements, whereas row-based replication is not sensitive to the execution order of sql statements.

innodb auto_increment lock mode usage implications

using auto-increment with replication

if you are using statement-based replication, set innodb_autoinc_lock_mode to 0 or 1 and use the same value on the master and its slaves. auto-increment values are not ensured to be the same on the slaves as on the master if you use innodb_autoinc_lock_mode = 2 (“interleaved”) or configurations where the master and slaves do not use the same lock mode.

if you are using row-based or mixed-format replication, all of the auto-increment lock modes are safe, since row-based replication is not sensitive to the order of execution of the sql statements (and the mixed format uses row-based replication for any statements that are unsafe for statement-based replication).
“lost” auto-increment values and sequence gaps

in all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”. once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the “insert-like” statement is completed, and whether or not the containing transaction is rolled back. such lost values are not reused. thus, there may be gaps in the values stored in an auto_increment column of a table.
specifying null or 0 for the auto_increment column

in all lock modes (0, 1, and 2), if a user specifies null or 0 for the auto_increment column in an insert, innodb treats the row as if the value was not specified and generates a new value for it.
assigning a negative value to the auto_increment column

in all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if you assign a negative value to the auto_increment column.
if the auto_increment value becomes larger than the maximum integer for the specified integer type

in all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if the value becomes larger than the maximum integer that can be stored in the specified integer type.
gaps in auto-increment values for “bulk inserts”

with innodb_autoinc_lock_mode set to 0 (“traditional”) or 1 (“consecutive”), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level auto-inc lock is held until the end of the statement, and only one such statement can execute at a time.

with innodb_autoinc_lock_mode set to 2 (“interleaved”), there may be gaps in the auto-increment values generated by “bulk inserts,” but only if there are concurrently executing “insert-like” statements.

for lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.
auto-increment values assigned by “mixed-mode inserts”

consider a “mixed-mode insert,” where a “simple insert” specifies the auto-increment value for some (but not all) resulting rows. such a statement behaves differently in lock modes 0, 1, and 2. for example, assume c1 is an auto_increment column of table t1, and that the most recent automatically generated sequence number is 100.

mysql> create table t1 (
    -> c1 int unsigned not null auto_increment primary key, 
    -> c2 char(1)
    -> ) engine = innodb;

now, consider the following “mixed-mode insert” statement:

mysql> insert into t1 (c1,c2) values (1,'a'), (null,'b'), (5,'c'), (null,'d');

with innodb_autoinc_lock_mode set to 0 (“traditional”), the four new rows are:

mysql> select c1, c2 from t1 order by c2;
+-----+------+
| c1  | c2   |
+-----+------+
|   1 | a    |
| 101 | b    |
|   5 | c    |
| 102 | d    |
+-----+------+

the next available auto-increment value is 103 because the auto-increment values are allocated one at a time, not all at once at the beginning of statement execution. this result is true whether or not there are concurrently executing “insert-like” statements (of any type).

with innodb_autoinc_lock_mode set to 1 (“consecutive”), the four new rows are also:

mysql> select c1, c2 from t1 order by c2;
+-----+------+
| c1  | c2   |
+-----+------+
|   1 | a    |
| 101 | b    |
|   5 | c    |
| 102 | d    |
+-----+------+

however, in this case, the next available auto-increment value is 105, not 103 because four auto-increment values are allocated at the time the statement is processed, but only two are used. this result is true whether or not there are concurrently executing “insert-like” statements (of any type).

with innodb_autoinc_lock_mode set to mode 2 (“interleaved”), the four new rows are:

mysql> select c1, c2 from t1 order by c2;
+-----+------+
| c1  | c2   |
+-----+------+
|   1 | a    |
|   x | b    |
|   5 | c    |
|   y | d    |
+-----+------+

the values of x and y are unique and larger than any previously generated rows. however, the specific values of x and y depend on the number of auto-increment values generated by concurrently executing statements.

finally, consider the following statement, issued when the most-recently generated sequence number is 100:

mysql> insert into t1 (c1,c2) values (1,'a'), (null,'b'), (101,'c'), (null,'d');

with any innodb_autoinc_lock_mode setting, this statement generates a duplicate-key error 23000 (can't write; duplicate key in table) because 101 is allocated for the row (null, 'b') and insertion of the row (101, 'c') fails.

modifying auto_increment column values in the middle of a sequence of insert statements

in mysql 5.7 and earlier, modifying an auto_increment column value in the middle of a sequence of insert statements could lead to “duplicate entry” errors. for example, if you performed an update operation that changed an auto_increment column value to a value larger than the current maximum auto-increment value, subsequent insert operations that did not specify an unused auto-increment value could encounter “duplicate entry” errors. in mysql 8.0 and later, if you modify anauto_increment column value to a value larger than the current maximum auto-increment value, the new value is persisted, and subsequent insert operations allocate auto-increment values starting from the new, larger value. this behavior is demonstrated in the following example.

mysql> create table t1 (
    -> c1 int not null auto_increment,
    -> primary key (c1)
    ->  ) engine = innodb;

mysql> insert into t1 values(0), (0), (3);

mysql> select c1 from t1;
+----+
| c1 |
+----+
|  1 |
|  2 |
|  3 |
+----+

mysql> update t1 set c1 = 4 where c1 = 1;

mysql> select c1 from t1;
+----+
| c1 |
+----+
|  2 |
|  3 |
|  4 |
+----+

mysql> insert into t1 values(0);

mysql> select c1 from t1;
+----+
| c1 |
+----+
|  2 |
|  3 |
|  4 |
|  5 |
+----+

innodb auto_increment counter initialization

this section describes how innodb initializes auto_increment counters.

if you specify an auto_increment column for an innodb table, the in-memory table object contains a special counter called the auto-increment counter that is used when assigning new values for the column.

in mysql 5.7 and earlier, the auto-increment counter is stored only in main memory, not on disk. to initialize an auto-increment counter after a server restart, innodbwould execute the equivalent of the following statement on the first insert into a table containing an auto_increment column.

select max(ai_col) from table_name for update;

in mysql 8.0, this behavior is changed. the current maximum auto-increment counter value is written to the redo log each time it changes and is saved to an engine-private system table on each checkpoint. these changes make the current maximum auto-increment counter value persistent across server restarts.

on a server restart following a normal shutdown, innodb initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table.

on a server restart during crash recovery, innodb initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table and scans the redo log for auto-increment counter values written since the last checkpoint. if a redo-logged value is greater than the in-memory counter value, the redo-logged value is applied. however, in the case of a server crash, reuse of a previously allocated auto-increment value cannot be guaranteed. each time the current maximum auto-increment value is changed due to an insert or update operation, the new value is written to the redo log, but if the crash occurs before the redo log is flushed to disk, the previously allocated value could be reused when the auto-increment counter is initialized after the server is restarted.

the only circumstance in which innodb uses the equivalent of a select max(ai_col) from table_name for update statement in mysql 8.0 and later to initialize an auto-increment counter is when importing a tablespace without a .cfg metadata file. otherwise, the current maximum auto-increment counter value is read from the .cfgmetadata file.

in mysql 5.7 and earlier, a server restart cancels the effect of the auto_increment = n table option, which may be used in a create table or alter table statement to set an initial counter value or alter the existing counter value, respectively. in mysql 8.0, a server restart does not cancel the effect of the auto_increment = n table option. if you initialize the auto-increment counter to a specific value, or if you alter the auto-increment counter value to a larger value, the new value is persisted across server restarts.

注意：alter table ... auto_increment = n can only change the auto-increment counter value to a value larger than the current maximum.

in mysql 5.7 and earlier, a server restart immediately following a rollback operation could result in the reuse of auto-increment values that were previously allocated to the rolled-back transaction, effectively rolling back the current maximum auto-increment value. in mysql 8.0, the current maximum auto-increment value is persisted, preventing the reuse of previously allocated values.

if a show table status statement examines a table before the auto-increment counter is initialized, innodb opens the table and initializes the counter value using the current maximum auto-increment value that is stored in the data dictionary system table. the value is stored in memory for use by later inserts or updates. initialization of the counter value uses a normal exclusive-locking read on the table which lasts to the end of the transaction. innodb follows the same procedure when initializing the auto-increment counter for a newly created table that has a user-specified auto-increment value that is greater than 0.

after the auto-increment counter is initialized, if you do not explicitly specify an auto-increment value when inserting a row, innodb implicitly increments the counter and assigns the new value to the column. if you insert a row that explicitly specifies an auto-increment column value, and the value is greater than the current maximum counter value, the counter is set to the specified value.

innodb uses the in-memory auto-increment counter as long as the server runs. when the server is stopped and restarted, innodb reinitializes the auto-increment counter, as described earlier.

the auto_increment_offset configuration option determines the starting point for the auto_increment column value. the default setting is 1.

the auto_increment_increment configuration option controls the interval between successive column values. the default setting is 1.

6.innodb and foreign key constraints

how the innodb storage engine handles foreign key constraints is described under the following topics in this section。

foreign key definitions

foreign key definitions for innodb tables are subject to the following conditions:

innodb permits a foreign key to reference any index column or group of columns. however, in the referenced table, there must be an index where the referenced columns are the first columns in the same order. hidden columns that innodb adds to an index are also considered .
innodb does not currently support foreign keys for tables with user-defined partitioning. this means that no user-partitioned innodb table may contain foreign key references or columns referenced by foreign keys.
innodb allows a foreign key constraint to reference a nonunique key. this is an innodb extension to standard sql.

referential actions

referential actions for foreign keys of innodb tables are subject to the following conditions:

while set default is allowed by the mysql server, it is rejected as invalid by innodb. create table and alter table statements using this clause are not allowed for innodb tables.
if there are several rows in the parent table that have the same referenced key value, innodb acts in foreign key checks as if the other parent rows with the same key value do not exist. for example, if you have defined a restrict type constraint, and there is a child row with several parent rows, innodb does not permit the deletion of any of those parent rows.
innodb performs cascading operations through a depth-first algorithm, based on records in the indexes corresponding to the foreign key constraints.
if on update cascade or on update set null recurses to update the same table it has previously updated during the cascade, it acts like restrict. this means that you cannot use self-referential on update cascade or on update set null operations. this is to prevent infinite loops resulting from cascaded updates. a self-referential on delete set null, on the other hand, is possible, as is a self-referential on delete cascade. cascading operations may not be nested more than 15 levels deep.
like mysql in general, in an sql statement that inserts, deletes, or updates many rows, innodb checks unique and foreign key constraints row-by-row. when performing foreign key checks, innodb sets shared row-level locks on child or parent records it has to look at. innodb checks foreign key constraints immediately; the check is not deferred to transaction commit. according to the sql standard, the default behavior should be deferred checking. that is, constraints are only checked after the entire sql statement has been processed. until innodb implements deferred constraint checking, some things are impossible, such as deleting a record that refers to itself using a foreign key.

foreign key restrictions for generated columns and virtual indexes

a foreign key constraint on a stored generated column cannot use cascade, set null, or set default as on update referential actions, nor can it use set null or set default as on delete referential actions.
a foreign key constraint on the base column of a stored generated column cannot use cascade, set null, or set default as on update or on delete referential actions.
a foreign key constraint cannot reference a virtual generated column.
prior to mysql 8.0, a foreign key constraint cannot reference a secondary index defined on a virtual generated column.

7.limits on innodb tables

limits on innodb tables are described under the following topics in this section.

maximums and minimums

a table can contain a maximum of 1017 columns. virtual generated columns are included in this limit.
a table can contain a maximum of 64 secondary indexes.
the index key prefix length limit is 3072 bytes for innodb tables that use dynamic or compressed row format.

the index key prefix length limit is 767 bytes for innodb tables that use redundant or compact row format. for example, you might hit this limit with a column prefixindex of more than 191 characters on a text or varchar column, assuming a utf8mb4 character set and the maximum of 4 bytes for each character.

attempting to use an index key prefix length that exceeds the limit returns an error.

the limits that apply to index key prefixes also apply to full-column index keys.
if you reduce the innodb page size to 8kb or 4kb by specifying the innodb_page_size option when creating the mysql instance, the maximum length of the index key is lowered proportionally, based on the limit of 3072 bytes for a 16kb page size. that is, the maximum index key length is 1536 bytes when the page size is 8kb, and 768 bytes when the page size is 4kb.
a maximum of 16 columns is permitted for multicolumn indexes. exceeding the limit returns an error.

error 1070 (42000): too many key parts specified; max 16 parts allowed

the maximum row length, except for variable-length columns (varbinary, varchar, blob and text), is slightly less than half of a page for 4kb, 8kb, 16kb, and 32kb page sizes. for example, the maximum row length for the default innodb_page_size of 16kb is about 8000 bytes. however, for an innodb page size of 64kb, the maximum row length is approximately 16000 bytes. longblob and longtext columns must be less than 4gb, and the total row length, including blob and textcolumns, must be less than 4gb.

if a row is less than half a page long, all of it is stored locally within the page. if it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page, as described in section 15.11.2, “file space management”.
although innodb supports row sizes larger than 65,535 bytes internally, mysql itself imposes a row-size limit of 65,535 for the combined size of all columns:

mysql> create table t (a varchar(8000), b varchar(10000),
    -> c varchar(10000), d varchar(10000), e varchar(10000),
    -> f varchar(10000), g varchar(10000)) engine=innodb;
error 1118 (42000): row size too large. the maximum row size for the
used table type, not counting blobs, is 65535. you have to change some
columns to text or blobs

on some older operating systems, files must be less than 2gb. this is not a limitation of innodb itself, but if you require a large tablespace, configure it using several smaller data files rather than one large data file.
the combined size of the innodb log files can be up to 512gb.
the minimum tablespace size is slightly larger than 10mb. the maximum tablespace size depends on the innodb page size.

innodb page size	maximum tablespace size
4kb	16tb
8kb	32tb
16kb	64tb
32kb	128tb
64kb	256tb

the maximum tablespace size is also the maximum size for a table.
the path of a tablespace file, including the file name, cannot exceed the max_path limit on windows. prior to windows 10, the max_path limit is 260 characters. as of windows 10, version 1607, max_path limitations are removed from common win32 file and directory functions, but you must enable the new behavior.
the default page size in innodb is 16kb. you can increase or decrease the page size by configuring the innodb_page_size option when creating the mysql instance.

32kb and 64kb page sizes are supported, but row_format=compressed is unsupported for page sizes greater than 16kb. for both 32kb and 64kb page sizes, the maximum record size is 16kb. for innodb_page_size=32kb, extent size is 2mb. for innodb_page_size=64kb, extent size is 4mb.

a mysql instance using a particular innodb page size cannot use data files or log files from an instance that uses a different page size.

restrictions on innodb tables

analyze table determines index cardinality (as displayed in the cardinality column of show index output) by performing random dives on each of the index trees and updating index cardinality estimates accordingly. because these are only estimates, repeated runs of analyze table could produce different numbers. this makesanalyze table fast on innodb tables but not 100% accurate because it does not take all rows into account.

you can make the statistics collected by analyze table more precise and more stable by turning on the innodb_stats_persistent configuration option, as explained in section 15.8.10.1, “configuring persistent optimizer statistics parameters”. when that setting is enabled, it is important to run analyze table after major changes to indexed column data, because the statistics are not recalculated periodically (such as after a server restart).

if the persistent statistics setting is enabled, you can change the number of random dives by modifying the innodb_stats_persistent_sample_pages system variable. if the persistent statistics setting is disabled, modify the innodb_stats_transient_sample_pages system variable instead.

mysql uses index cardinality estimates in join optimization. if a join is not optimized in the right way, try using analyze table. in the few cases that analyze tabledoes not produce values good enough for your particular tables, you can use force index with your queries to force the use of a particular index, or set themax_seeks_for_key system variable to ensure that mysql prefers index lookups over table scans. see section b.4.5, “optimizer-related issues”.
if statements or transactions are running on a table, and analyze table is run on the same table followed by a second analyze table operation, the second analyze table operation is blocked until the statements or transactions are completed. this behavior occurs because analyze table marks the currently loaded table definition as obsolete when analyze table is finished running. new statements or transactions (including a second analyze table statement) must load the new table definition into the table cache, which cannot occur until currently running statements or transactions are completed and the old table definition is purged. loading multiple concurrent table definitions is not supported.
show table status does not give accurate statistics on innodb tables except for the physical size reserved by the table. the row count is only a rough estimate used in sql optimization.
innodb does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. consequently, select count(*) statements only count rows visible to the current transaction.
on windows, innodb always stores database and table names internally in lowercase. to move databases in a binary format from unix to windows or from windows to unix, create all databases and tables using lowercase names.
an auto_increment column ai_col must be defined as part of an index such that it is possible to perform the equivalent of an indexed select max(ai_col) lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.

when an auto_increment integer column runs out of values, a subsequent insert operation returns a duplicate-key error. this is general mysql behavior.
delete from tbl_name does not regenerate the table but instead deletes all rows, one by one.
cascaded foreign key actions do not activate triggers.
you cannot create a table with a column name that matches the name of an internal innodb column (including db_row_id, db_trx_id, db_roll_ptr, and db_mix_id). this restriction applies to use of the names in any letter case.

mysql> create table t1 (c1 int, db_row_id int) engine=innodb;
error 1166 (42000): incorrect column name 'db_row_id'

locking and transactions

lock tables acquires two locks on each table if innodb_table_locks=1 (the default). in addition to a table lock on the mysql layer, it also acquires an innodb table lock. versions of mysql before 4.1.2 did not acquire innodb table locks; the old behavior can be selected by setting innodb_table_locks=0. if no innodb table lock is acquired, lock tables completes even if some records of the tables are being locked by other transactions.

in mysql 8.0, innodb_table_locks=0 has no effect for tables locked explicitly with lock tables ... write. it does have an effect for tables locked for read or write by lock tables ... write implicitly (for example, through triggers) or by lock tables ... read.
all innodb locks held by a transaction are released when the transaction is committed or aborted. thus, it does not make much sense to invoke lock tables oninnodb tables in autocommit=1 mode because the acquired innodb table locks would be released immediately.
you cannot lock additional tables in the middle of a transaction because lock tables performs an implicit commit and unlock tables.

转载、节选于

上一篇：小米被OPPO超越！2016年第二季度全球手机厂商终..

下一篇：数据库通配符

InnoDB On-Disk Structures--Tables (转载）

1.innodb architecture

this section covers topics related to innodb tables.

2 creating innodb tables

innodb tables and row formats

innodb tables and primary keys

viewing innodb table properties

3.moving or copying innodb tables

transportable tablespaces

mysql enterprise backup

copying data files (cold backup method)

export and import (mysqldump)

4.converting tables from myisam to innodb

adjusting memory usage for myisam and innodb

handling too-long or too-short transactions

handling deadlocks

planning the storage layout

converting an existing table

cloning the structure of a table

transferring existing data

storage requirements

defining a primary key for each table

application performance considerations

understanding files associated with innodb tables

5.auto_increment handling in innodb

innodb auto_increment lock modes

innodb auto_increment lock mode usage implications

innodb auto_increment counter initialization

6.innodb and foreign key constraints

foreign key definitions

referential actions

foreign key restrictions for generated columns and virtual indexes

7.limits on innodb tables

maximums and minimums

restrictions on innodb tables

locking and transactions

InnoDB On-Disk Structures（三）--Tablespaces (转载）

InnoDB On-Disk Structures（四）--Doublewrite Buffer (转载）

InnoDB On-Disk Structures（五）-- Redo Log & Undo Logs (转载）

InnoDB Architecture （InnoDB In-Memory Structures 转载）