InnoDB On-Disk Structures--Tables (转载)
转载、节选于
1.innodb architecture
the following diagram shows in-memory and on-disk structures that comprise the innodb
storage engine architecture.
this section covers topics related to innodb
tables.
2 creating innodb tables
you do not need to specify the engine=innodb
clause if innodb
is defined as the default storage engine, which it is by default.
an innodb
table and its indexes can be created in the system tablespace, in a file-per-table tablespace, or in a general tablespace. when innodb_file_per_table
is enabled, which is the default, an innodb
table is implicitly created in an individual file-per-table tablespace. conversely, when innodb_file_per_table
is disabled, an innodb
table is implicitly created in the innodb
system tablespace. to create a table in a general tablespace, use create table ... tablespace
syntax.
when you create a table in a file-per-table tablespace, mysql creates an .ibd tablespace file in a database directory under the mysql data directory, by default. a table created in the innodb
system tablespace is created in an existing ibdata file, which resides in the mysql data directory. a table created in a general tablespace is created in an existing general tablespace .ibd file. general tablespace files can be created inside or outside of the mysql data directory.
internally, innodb
adds an entry for each table to the data dictionary. the entry includes the database name. for example, if table t1
is created in the test
database, the data dictionary entry for the database name is 'test/t1'
. this means you can create a table of the same name (t1
) in a different database, and the table names do not collide inside innodb
.
innodb tables and row formats
the default row format for innodb
tables is defined by the innodb_default_row_format
configuration option, which has a default value of dynamic
. dynamic
andcompressed
row format allow you to take advantage of innodb
features such as table compression and efficient off-page storage of long column values. to use these row formats, innodb_file_per_table
must be enabled (the default).
innodb tables and primary keys
always define a primary key for an innodb
table, specifying the column or columns that:
-
are referenced by the most important queries.
-
are never left blank.
-
never have duplicate values.
-
rarely if ever change value once inserted.
for example, in a table containing information about people, you would not create a primary key on (firstname, lastname)
because more than one person can have the same name, some people have blank last names, and sometimes people change their names. with so many constraints, often there is not an obvious set of columns to use as a primary key, so you create a new column with a numeric id to serve as all or part of the primary key. you can declare an auto-increment column so that ascending values are filled in automatically as rows are inserted。
# the value of id can act like a pointer between related items in different tables. create table t5 (id int auto_increment, b char (20), primary key (id)); # the primary key can consist of more than one column. any autoinc column must come first. create table t6 (id int auto_increment, a int, b char (20), primary key (id,a));
although the table works correctly without defining a primary key, the primary key is involved with many aspects of performance and is a crucial design aspect for any large or frequently used table. it is recommended that you always specify a primary key in the create table
statement. if you create the table, load data, and then run alter table
to add a primary key later, that operation is much slower than defining the primary key when creating the table.
viewing innodb table properties
to view the properties of an innodb
table, issue a show table status
statement:
mysql> show table status from test like 't%' \g; *************************** 1. row *************************** name: t1 engine: innodb version: 10 row_format: compact rows: 0 avg_row_length: 0 data_length: 16384 max_data_length: 0 index_length: 0 data_free: 0 auto_increment: null create_time: 2015-03-16 15:13:31 update_time: null check_time: null collation: utf8mb4_0900_ai_ci checksum: null create_options: comment:
innodb
table properties may also be queried using the innodb
information schema system tables:
mysql> select * from information_schema.innodb_tables where name='test/t1' \g *************************** 1. row *************************** table_id: 45 name: test/t1 flag: 1 n_cols: 5 space: 35 row_format: compact zip_page_size: 0 space_type: single
3.moving or copying innodb tables
this section describes techniques for moving or copying some or all innodb
tables to a different server or instance. for example, you might move an entire mysql instance to a larger, faster server; you might clone an entire mysql instance to a new replication slave server; you might copy individual tables to another instance to develop and test an application, or to a data warehouse server to produce reports.
on windows, innodb
always stores database and table names internally in lowercase. to move databases in a binary format from unix to windows or from windows to unix, create all databases and tables using lowercase names. a convenient way to accomplish this is to add the following line to the [mysqld]
section of your my.cnf
or my.ini
file before creating any databases or tables:
[mysqld] lower_case_table_names=1
注意:it is prohibited to start the server with a lower_case_table_names
setting that is different from the setting used when the server was initialized.
techniques for moving or copying innodb
tables include:
transportable tablespaces
the transportable tablespaces feature uses flush tables ... for export
to ready innodb
tables for copying from one server instance to another. to use this feature, innodb
tables must be created with innodb_file_per_table
set to on
so that each innodb
table has its own tablespace.
mysql enterprise backup
the mysql enterprise backup product lets you back up a running mysql database with minimal disruption to operations while producing a consistent snapshot of the database. when mysql enterprise backup is copying tables, reads and writes can continue. in addition, mysql enterprise backup can create compressed backup files, and back up subsets of tables. in conjunction with the mysql binary log, you can perform point-in-time recovery. mysql enterprise backup is included as part of the mysql enterprise subscription.
copying data files (cold backup method)
innodb
data and log files are binary-compatible on all platforms having the same floating-point number format. if the floating-point formats differ but you have not usedfloat
or double
data types in your tables, then the procedure is the same: simply copy the relevant files.
when you move or copy file-per-table .ibd
files, the database directory name must be the same on the source and destination systems. the table definition stored in theinnodb
shared tablespace includes the database name. the transaction ids and log sequence numbers stored in the tablespace files also differ between databases.
export and import (mysqldump)
you can use mysqldump to dump your tables on one machine and then import the dump files on the other machine. using this method, it does not matter whether the formats differ or if your tables contain floating-point data.
one way to increase the performance of this method is to switch off autocommit mode when importing data, assuming that the tablespace has enough space for the big rollback segment that the import transactions generate. do the commit only after importing a whole table or a segment of a table.
4.converting tables from myisam to innodb
if you have myisam
tables that you want to convert to innodb
for better reliability and scalability, review the following guidelines and tips before converting.
adjusting memory usage for myisam and innodb
as you transition away from myisam
tables, lower the value of the key_buffer_size
configuration option to free memory no longer needed for caching results. increase the value of the innodb_buffer_pool_size
configuration option, which performs a similar role of allocating cache memory for innodb
tables. the innodb
buffer poolcaches both table data and index data, speeding up lookups for queries and keeping query results in memory for reuse.
handling too-long or too-short transactions
because myisam
tables do not support transactions, you might not have paid much attention to the autocommit
configuration option and the commit
and rollback
statements. these keywords are important to allow multiple sessions to read and write innodb
tables concurrently, providing substantial scalability benefits in write-heavy workloads.
while a transaction is open, the system keeps a snapshot of the data as seen at the beginning of the transaction, which can cause substantial overhead if the system inserts, updates, and deletes millions of rows while a stray transaction keeps running. thus, take care to avoid transactions that run for too long:
-
if you are using a mysql session for interactive experiments, always
commit
(to finalize the changes) orrollback
(to undo the changes) when finished. close down interactive sessions rather than leave them open for long periods, to avoid keeping transactions open for long periods by accident. -
make sure that any error handlers in your application also
rollback
incomplete changes orcommit
completed changes. -
rollback
is a relatively expensive operation, becauseinsert
,update
, anddelete
operations are written toinnodb
tables prior to thecommit
, with the expectation that most changes are committed successfully and rollbacks are rare. when experimenting with large volumes of data, avoid making changes to large numbers of rows and then rolling back those changes. -
when loading large volumes of data with a sequence of
insert
statements, periodicallycommit
the results to avoid having transactions that last for hours. in typical load operations for data warehousing, if something goes wrong, you truncate the table (usingtruncate table
) and start over from the beginning rather than doing arollback
.
the preceding tips save memory and disk space that can be wasted during too-long transactions. when transactions are shorter than they should be, the problem is excessive i/o. with each commit
, mysql makes sure each change is safely recorded to disk, which involves some i/o.
-
for most operations on
innodb
tables, you should use the settingautocommit=0
. from an efficiency perspective, this avoids unnecessary i/o when you issue large numbers of consecutiveinsert
,update
, ordelete
statements. from a safety perspective, this allows you to issue arollback
statement to recover lost or garbled data if you make a mistake on the mysql command line, or in an exception handler in your application. -
the time when
autocommit=1
is suitable forinnodb
tables is when running a sequence of queries for generating reports or analyzing statistics. in this situation, there is no i/o penalty related tocommit
orrollback
, andinnodb
can automatically optimize the read-only workload. -
if you make a series of related changes, finalize all the changes at once with a single
commit
at the end. for example, if you insert related pieces of information into several tables, do a singlecommit
after making all the changes. or if you run many consecutiveinsert
statements, do a singlecommit
after all the data is loaded; if you are doing millions ofinsert
statements, perhaps split up the huge transaction by issuing acommit
every ten thousand or hundred thousand records, so the transaction does not grow too large. -
remember that even a
select
statement opens a transaction, so after running some report or debugging queries in an interactive mysql session, either issue acommit
or close the mysql session.
handling deadlocks
you might see warning messages referring to “deadlocks” in the mysql error log, or the output of show engine innodb status
. despite the scary-sounding name, adeadlock is not a serious issue for innodb
tables, and often does not require any corrective action. when two transactions start modifying multiple tables, accessing the tables in a different order, they can reach a state where each transaction is waiting for the other and neither can proceed. when deadlock detection is enabled (the default), mysql immediately detects this condition and cancels (rolls back) the “smaller” transaction, allowing the other to proceed. if deadlock detection is disabled using theinnodb_deadlock_detect
configuration option, innodb
relies on the innodb_lock_wait_timeout
setting to roll back transactions in case of a deadlock.
either way, your applications need error-handling logic to restart a transaction that is forcibly cancelled due to a deadlock. when you re-issue the same sql statements as before, the original timing issue no longer applies. either the other transaction has already finished and yours can proceed, or the other transaction is still in progress and your transaction waits until it finishes.
if deadlock warnings occur constantly, you might review the application code to reorder the sql operations in a consistent way, or to shorten the transactions. you can test with the innodb_print_all_deadlocks
option enabled to see all deadlock warnings in the mysql error log, rather than only the last warning in the show engine innodb status
output.
planning the storage layout
to get the best performance from innodb
tables, you can adjust a number of parameters related to storage layout.
when you convert myisam
tables that are large, frequently accessed, and hold vital data, investigate and consider the innodb_file_per_table
and innodb_page_size
configuration options, and the row_format
and key_block_size
clauses of the create table
statement.
during your initial experiments, the most important setting is innodb_file_per_table
. when this setting is enabled, which is the default, new innodb
tables are implicitly created in file-per-table tablespaces. in contrast with the innodb
system tablespace, file-per-table tablespaces allow disk space to be reclaimed by the operating system when a table is truncated or dropped. file-per-table tablespaces also support dynamic and compressed row formats and associated features such as table compression, efficient off-page storage for long variable-length columns, and large index prefixes.
you can also store innodb
tables in a shared general tablespace, which support multiple tables and all row formats.
converting an existing table
to convert a non-innodb
table to use innodb
use alter table
:
alter table table_name engine=innodb;
cloning the structure of a table
you might make an innodb
table that is a clone of a myisam table, rather than using alter table
to perform conversion, to test the old and new table side-by-side before switching.
create an empty innodb
table with identical column and index definitions. use show create table
to see the full table_name
\gcreate table
statement to use. change the engine
clause to engine=innodb
.
transferring existing data
to transfer a large volume of data into an empty innodb
table created as shown in the previous section, insert the rows with insert into
.innodb_table
select * frommyisam_table
order by primary_key_columns
you can also create the indexes for the innodb
table after inserting the data. historically, creating new secondary indexes was a slow operation for innodb, but now you can create the indexes after the data is loaded with relatively little overhead from the index creation step.
if you have unique
constraints on secondary keys, you can speed up a table import by turning off the uniqueness checks temporarily during the import operation:
set unique_checks=0; ... import operation ... set unique_checks=1;
for big tables, this saves disk i/o because innodb
can use its change buffer to write secondary index records as a batch. be certain that the data contains no duplicate keys.unique_checks
permits but does not require storage engines to ignore duplicate keys.
for better control over the insertion process, you can insert big tables in pieces:
insert into newtable select * from oldtable where yourkey > something and yourkey <= somethingelse;
after all records are inserted, you can rename the tables.
during the conversion of big tables, increase the size of the innodb
buffer pool to reduce disk i/o, to a maximum of 80% of physical memory. you can also increase the size of innodb
log files.
storage requirements
if you intend to make several temporary copies of your data in innodb
tables during the conversion process, it is recommended that you create the tables in file-per-table tablespaces so that you can reclaim the disk space when you drop the tables. when the innodb_file_per_table
configuration option is enabled (the default), newly created innodb
tables are implicitly created in file-per-table tablespaces.
whether you convert the myisam
table directly or create a cloned innodb
table, make sure that you have sufficient disk space to hold both the old and new tables during the process. innodb
tables require more disk space than myisam
tables. if an alter table
operation runs out of space, it starts a rollback, and that can take hours if it is disk-bound. for inserts, innodb
uses the insert buffer to merge secondary index records to indexes in batches. that saves a lot of disk i/o. for rollback, no such mechanism is used, and the rollback can take 30 times longer than the insertion.
in the case of a runaway rollback, if you do not have valuable data in your database, it may be advisable to kill the database process rather than wait for millions of disk i/o operations to complete.
defining a primary key for each table
the primary key
clause is a critical factor affecting the performance of mysql queries and the space usage for tables and indexes. the primary key uniquely identifies a row in a table. every row in the table must have a primary key value, and no two rows can have the same primary key value.
these are guidelines for the primary key, followed by more detailed explanations.
-
declare a
primary key
for each table. typically, it is the most important column that you refer to inwhere
clauses when looking up a single row. -
declare the
primary key
clause in the originalcreate table
statement, rather than adding it later through analter table
statement. -
choose the column and its data type carefully. prefer numeric columns over character or string ones.
-
consider using an auto-increment column if there is not another stable, unique, non-null, numeric column to use.
-
an auto-increment column is also a good choice if there is any doubt whether the value of the primary key column could ever change. changing the value of a primary key column is an expensive operation, possibly involving rearranging data within the table and within each secondary index.
consider adding a primary key to any table that does not already have one. use the smallest practical numeric type based on the maximum projected size of the table. this can make each row slightly more compact, which can yield substantial space savings for large tables. the space savings are multiplied if the table has any secondary indexes, because the primary key value is repeated in each secondary index entry. in addition to reducing data size on disk, a small primary key also lets more data fit into the buffer pool, speeding up all kinds of operations and improving concurrency.
if the table already has a primary key on some longer column, such as a varchar
, consider adding a new unsigned auto_increment
column and switching the primary key to that, even if that column is not referenced in queries. this design change can produce substantial space savings in the secondary indexes. you can designate the former primary key columns as unique not null
to enforce the same constraints as the primary key
clause, that is, to prevent duplicate or null values across all those columns.
if you spread related information across multiple tables, typically each table uses the same column for its primary key. for example, a personnel database might have several tables, each with a primary key of employee number. a sales database might have some tables with a primary key of customer number, and other tables with a primary key of order number. because lookups using the primary key are very fast, you can construct efficient join queries for such tables.
if you leave the primary key
clause out entirely, mysql creates an invisible one for you. it is a 6-byte value that might be longer than you need, thus wasting space. because it is hidden, you cannot refer to it in queries.
application performance considerations
the reliability and scalability features of innodb
require more disk storage than equivalent myisam
tables. you might change the column and index definitions slightly, for better space utilization, reduced i/o and memory consumption when processing result sets, and better query optimization plans making efficient use of index lookups.
if you do set up a numeric id column for the primary key, use that value to cross-reference with related values in any other tables, particularly for join queries. for example, rather than accepting a country name as input and doing queries searching for the same name, do one lookup to determine the country id, then do other queries (or a single join query) to look up relevant information across several tables. rather than storing a customer or catalog item number as a string of digits, potentially using up several bytes, convert it to a numeric id for storing and querying. a 4-byte unsigned int
column can index over 4 billion items (with the us meaning of billion: 1000 million).
understanding files associated with innodb tables
innodb
files require more care and planning than myisam
files do.you must not delete the ibdata files that represent the innodb
system tablespace.
5.auto_increment handling in innodb
innodb
provides a configurable locking mechanism that can significantly improve scalability and performance of sql statements that add rows to tables with auto_increment
columns. to use the auto_increment
mechanism with an innodb
table, an auto_increment
column must be defined as part of an index such that it is possible to perform the equivalent of an indexed select max(
lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.ai_col
)
innodb
provides a configurable locking mechanism that can significantly improve scalability and performance of sql statements that add rows to tables withauto_increment
columns. to use the auto_increment
mechanism with an innodb
table, an auto_increment
column must be defined as part of an index such that it is possible to perform the equivalent of an indexed select max(
lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.ai_col
)
this section describes the behavior of auto_increment
lock modes, usage implications for different auto_increment
lock mode settings, and how innodb
initializes theauto_increment
counter.
innodb auto_increment lock modes
this section describes the behavior of auto_increment
lock modes used to generate auto-increment values, and how each lock mode affects replication. auto-increment lock modes are configured at startup using the innodb_autoinc_lock_mode
configuration parameter.
the following terms are used in describing innodb_autoinc_lock_mode
settings:
-
“
insert
-like” statementsall statements that generate new rows in a table, including
insert
,insert ... select
,replace
,replace ... select
, andload data
. includes “simple-inserts”,“bulk-inserts”, and “mixed-mode” inserts. -
“simple inserts”
statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). this includes single-row and multiple-row
insert
andreplace
statements that do not have a nested subquery, but notinsert ... on duplicate key update
. -
“bulk inserts”
statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. this includes
insert ... select
,replace ... select
, andload data
statements, but not plaininsert
.innodb
assigns new values for theauto_increment
column one at a time as each row is processed. -
“mixed-mode inserts”
these are “simple insert” statements that specify the auto-increment value for some (but not all) of the new rows. an example follows, where
c1
is anauto_increment
column of tablet1
:
insert into t1 (c1,c2) values (1,'a'), (null,'b'), (5,'c'), (null,'d');
another type of “mixed-mode insert” is insert ... on duplicate key update
, which in the worst case is in effect an insert
followed by a update
, where the allocated value for the auto_increment
column may or may not be used during the update phase.
there are three possible settings for the innodb_autoinc_lock_mode
configuration parameter. the settings are 0, 1, or 2, for “traditional”, “consecutive”, or “interleaved”lock mode, respectively. as of mysql 8.0, interleaved lock mode (innodb_autoinc_lock_mode=2
) is the default setting. prior to mysql 8.0, consecutive lock mode is the default (innodb_autoinc_lock_mode=1
).
the default setting of interleaved lock mode in mysql 8.0 reflects the change from statement-based replication to row based replication as the default replication type. statement-based replication requires the consecutive auto-increment lock mode to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of sql statements, whereas row-based replication is not sensitive to the execution order of sql statements.
innodb auto_increment lock mode usage implications
-
using auto-increment with replication
if you are using statement-based replication, set
innodb_autoinc_lock_mode
to 0 or 1 and use the same value on the master and its slaves. auto-increment values are not ensured to be the same on the slaves as on the master if you useinnodb_autoinc_lock_mode
= 2 (“interleaved”) or configurations where the master and slaves do not use the same lock mode.if you are using row-based or mixed-format replication, all of the auto-increment lock modes are safe, since row-based replication is not sensitive to the order of execution of the sql statements (and the mixed format uses row-based replication for any statements that are unsafe for statement-based replication).
-
“lost” auto-increment values and sequence gaps
in all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”. once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the “
insert
-like” statement is completed, and whether or not the containing transaction is rolled back. such lost values are not reused. thus, there may be gaps in the values stored in anauto_increment
column of a table. -
specifying null or 0 for the
auto_increment
columnin all lock modes (0, 1, and 2), if a user specifies null or 0 for the
auto_increment
column in aninsert
,innodb
treats the row as if the value was not specified and generates a new value for it. -
assigning a negative value to the
auto_increment
columnin all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if you assign a negative value to the
auto_increment
column. -
if the
auto_increment
value becomes larger than the maximum integer for the specified integer typein all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if the value becomes larger than the maximum integer that can be stored in the specified integer type.
-
gaps in auto-increment values for “bulk inserts”
with
innodb_autoinc_lock_mode
set to 0 (“traditional”) or 1 (“consecutive”), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-levelauto-inc
lock is held until the end of the statement, and only one such statement can execute at a time.with
innodb_autoinc_lock_mode
set to 2 (“interleaved”), there may be gaps in the auto-increment values generated by “bulk inserts,” but only if there are concurrently executing “insert
-like” statements.for lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.
-
auto-increment values assigned by “mixed-mode inserts”
consider a “mixed-mode insert,” where a “simple insert” specifies the auto-increment value for some (but not all) resulting rows. such a statement behaves differently in lock modes 0, 1, and 2. for example, assume
c1
is anauto_increment
column of tablet1
, and that the most recent automatically generated sequence number is 100.
mysql> create table t1 ( -> c1 int unsigned not null auto_increment primary key, -> c2 char(1) -> ) engine = innodb;
now, consider the following “mixed-mode insert” statement:
mysql> insert into t1 (c1,c2) values (1,'a'), (null,'b'), (5,'c'), (null,'d');
with innodb_autoinc_lock_mode
set to 0 (“traditional”), the four new rows are:
mysql> select c1, c2 from t1 order by c2; +-----+------+ | c1 | c2 | +-----+------+ | 1 | a | | 101 | b | | 5 | c | | 102 | d | +-----+------+
the next available auto-increment value is 103 because the auto-increment values are allocated one at a time, not all at once at the beginning of statement execution. this result is true whether or not there are concurrently executing “insert
-like” statements (of any type).
with innodb_autoinc_lock_mode
set to 1 (“consecutive”), the four new rows are also:
mysql> select c1, c2 from t1 order by c2; +-----+------+ | c1 | c2 | +-----+------+ | 1 | a | | 101 | b | | 5 | c | | 102 | d | +-----+------+
however, in this case, the next available auto-increment value is 105, not 103 because four auto-increment values are allocated at the time the statement is processed, but only two are used. this result is true whether or not there are concurrently executing “insert
-like” statements (of any type).
with innodb_autoinc_lock_mode
set to mode 2 (“interleaved”), the four new rows are:
mysql> select c1, c2 from t1 order by c2; +-----+------+ | c1 | c2 | +-----+------+ | 1 | a | | x | b | | 5 | c | | y | d | +-----+------+
the values of x
and y
are unique and larger than any previously generated rows. however, the specific values of x
and y
depend on the number of auto-increment values generated by concurrently executing statements.
finally, consider the following statement, issued when the most-recently generated sequence number is 100:
mysql> insert into t1 (c1,c2) values (1,'a'), (null,'b'), (101,'c'), (null,'d');
with any innodb_autoinc_lock_mode
setting, this statement generates a duplicate-key error 23000 (can't write; duplicate key in table
) because 101 is allocated for the row (null, 'b')
and insertion of the row (101, 'c')
fails.
-
modifying
auto_increment
column values in the middle of a sequence ofinsert
statements
in mysql 5.7 and earlier, modifying an auto_increment
column value in the middle of a sequence of insert
statements could lead to “duplicate entry” errors. for example, if you performed an update
operation that changed an auto_increment
column value to a value larger than the current maximum auto-increment value, subsequent insert
operations that did not specify an unused auto-increment value could encounter “duplicate entry” errors. in mysql 8.0 and later, if you modify anauto_increment
column value to a value larger than the current maximum auto-increment value, the new value is persisted, and subsequent insert
operations allocate auto-increment values starting from the new, larger value. this behavior is demonstrated in the following example.
mysql> create table t1 ( -> c1 int not null auto_increment, -> primary key (c1) -> ) engine = innodb; mysql> insert into t1 values(0), (0), (3); mysql> select c1 from t1; +----+ | c1 | +----+ | 1 | | 2 | | 3 | +----+ mysql> update t1 set c1 = 4 where c1 = 1; mysql> select c1 from t1; +----+ | c1 | +----+ | 2 | | 3 | | 4 | +----+ mysql> insert into t1 values(0); mysql> select c1 from t1; +----+ | c1 | +----+ | 2 | | 3 | | 4 | | 5 | +----+
innodb auto_increment counter initialization
this section describes how innodb
initializes auto_increment
counters.
if you specify an auto_increment
column for an innodb
table, the in-memory table object contains a special counter called the auto-increment counter that is used when assigning new values for the column.
in mysql 5.7 and earlier, the auto-increment counter is stored only in main memory, not on disk. to initialize an auto-increment counter after a server restart, innodb
would execute the equivalent of the following statement on the first insert into a table containing an auto_increment
column.
select max(ai_col) from table_name for update;
in mysql 8.0, this behavior is changed. the current maximum auto-increment counter value is written to the redo log each time it changes and is saved to an engine-private system table on each checkpoint. these changes make the current maximum auto-increment counter value persistent across server restarts.
on a server restart following a normal shutdown, innodb
initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table.
on a server restart during crash recovery, innodb
initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table and scans the redo log for auto-increment counter values written since the last checkpoint. if a redo-logged value is greater than the in-memory counter value, the redo-logged value is applied. however, in the case of a server crash, reuse of a previously allocated auto-increment value cannot be guaranteed. each time the current maximum auto-increment value is changed due to an insert
or update
operation, the new value is written to the redo log, but if the crash occurs before the redo log is flushed to disk, the previously allocated value could be reused when the auto-increment counter is initialized after the server is restarted.
the only circumstance in which innodb
uses the equivalent of a select max(ai_col) from
statement in mysql 8.0 and later to initialize an auto-increment counter is when importing a tablespace without a table_name
for update.cfg
metadata file. otherwise, the current maximum auto-increment counter value is read from the .cfg
metadata file.
in mysql 5.7 and earlier, a server restart cancels the effect of the auto_increment = n
table option, which may be used in a create table
or alter table
statement to set an initial counter value or alter the existing counter value, respectively. in mysql 8.0, a server restart does not cancel the effect of the auto_increment = n
table option. if you initialize the auto-increment counter to a specific value, or if you alter the auto-increment counter value to a larger value, the new value is persisted across server restarts.
注意:alter table ... auto_increment = n
can only change the auto-increment counter value to a value larger than the current maximum.
in mysql 5.7 and earlier, a server restart immediately following a rollback
operation could result in the reuse of auto-increment values that were previously allocated to the rolled-back transaction, effectively rolling back the current maximum auto-increment value. in mysql 8.0, the current maximum auto-increment value is persisted, preventing the reuse of previously allocated values.
if a show table status
statement examines a table before the auto-increment counter is initialized, innodb
opens the table and initializes the counter value using the current maximum auto-increment value that is stored in the data dictionary system table. the value is stored in memory for use by later inserts or updates. initialization of the counter value uses a normal exclusive-locking read on the table which lasts to the end of the transaction. innodb
follows the same procedure when initializing the auto-increment counter for a newly created table that has a user-specified auto-increment value that is greater than 0.
after the auto-increment counter is initialized, if you do not explicitly specify an auto-increment value when inserting a row, innodb
implicitly increments the counter and assigns the new value to the column. if you insert a row that explicitly specifies an auto-increment column value, and the value is greater than the current maximum counter value, the counter is set to the specified value.
innodb
uses the in-memory auto-increment counter as long as the server runs. when the server is stopped and restarted, innodb
reinitializes the auto-increment counter, as described earlier.
the auto_increment_offset
configuration option determines the starting point for the auto_increment
column value. the default setting is 1.
the auto_increment_increment
configuration option controls the interval between successive column values. the default setting is 1.
6.innodb and foreign key constraints
how the innodb
storage engine handles foreign key constraints is described under the following topics in this section。
foreign key definitions
foreign key definitions for innodb
tables are subject to the following conditions:
-
innodb
permits a foreign key to reference any index column or group of columns. however, in the referenced table, there must be an index where the referenced columns are the first columns in the same order. hidden columns thatinnodb
adds to an index are also considered . -
innodb
does not currently support foreign keys for tables with user-defined partitioning. this means that no user-partitionedinnodb
table may contain foreign key references or columns referenced by foreign keys. -
innodb
allows a foreign key constraint to reference a nonunique key. this is aninnodb
extension to standard sql.
referential actions
referential actions for foreign keys of innodb
tables are subject to the following conditions:
-
while
set default
is allowed by the mysql server, it is rejected as invalid byinnodb
.create table
andalter table
statements using this clause are not allowed for innodb tables. -
if there are several rows in the parent table that have the same referenced key value,
innodb
acts in foreign key checks as if the other parent rows with the same key value do not exist. for example, if you have defined arestrict
type constraint, and there is a child row with several parent rows,innodb
does not permit the deletion of any of those parent rows. -
innodb
performs cascading operations through a depth-first algorithm, based on records in the indexes corresponding to the foreign key constraints. -
if
on update cascade
oron update set null
recurses to update the same table it has previously updated during the cascade, it acts likerestrict
. this means that you cannot use self-referentialon update cascade
oron update set null
operations. this is to prevent infinite loops resulting from cascaded updates. a self-referentialon delete set null
, on the other hand, is possible, as is a self-referentialon delete cascade
. cascading operations may not be nested more than 15 levels deep. -
like mysql in general, in an sql statement that inserts, deletes, or updates many rows,
innodb
checksunique
andforeign key
constraints row-by-row. when performing foreign key checks,innodb
sets shared row-level locks on child or parent records it has to look at.innodb
checks foreign key constraints immediately; the check is not deferred to transaction commit. according to the sql standard, the default behavior should be deferred checking. that is, constraints are only checked after the entire sql statement has been processed. untilinnodb
implements deferred constraint checking, some things are impossible, such as deleting a record that refers to itself using a foreign key.
foreign key restrictions for generated columns and virtual indexes
-
a foreign key constraint on a stored generated column cannot use
cascade
,set null
, orset default
ason update
referential actions, nor can it useset null
orset default
ason delete
referential actions. -
a foreign key constraint on the base column of a stored generated column cannot use
cascade
,set null
, orset default
ason update
oron delete
referential actions. -
a foreign key constraint cannot reference a virtual generated column.
-
prior to mysql 8.0, a foreign key constraint cannot reference a secondary index defined on a virtual generated column.
7.limits on innodb tables
limits on innodb
tables are described under the following topics in this section.
maximums and minimums
-
a table can contain a maximum of 1017 columns. virtual generated columns are included in this limit.
-
a table can contain a maximum of 64 secondary indexes.
-
the index key prefix length limit is 3072 bytes for
innodb
tables that usedynamic
orcompressed
row format.the index key prefix length limit is 767 bytes for
innodb
tables that useredundant
orcompact
row format. for example, you might hit this limit with a column prefixindex of more than 191 characters on atext
orvarchar
column, assuming autf8mb4
character set and the maximum of 4 bytes for each character.attempting to use an index key prefix length that exceeds the limit returns an error.
the limits that apply to index key prefixes also apply to full-column index keys.
-
if you reduce the
innodb
page size to 8kb or 4kb by specifying theinnodb_page_size
option when creating the mysql instance, the maximum length of the index key is lowered proportionally, based on the limit of 3072 bytes for a 16kb page size. that is, the maximum index key length is 1536 bytes when the page size is 8kb, and 768 bytes when the page size is 4kb. -
a maximum of 16 columns is permitted for multicolumn indexes. exceeding the limit returns an error.
error 1070 (42000): too many key parts specified; max 16 parts allowed
-
the maximum row length, except for variable-length columns (
varbinary
,varchar
,blob
andtext
), is slightly less than half of a page for 4kb, 8kb, 16kb, and 32kb page sizes. for example, the maximum row length for the defaultinnodb_page_size
of 16kb is about 8000 bytes. however, for aninnodb
page size of 64kb, the maximum row length is approximately 16000 bytes.longblob
andlongtext
columns must be less than 4gb, and the total row length, includingblob
andtext
columns, must be less than 4gb.if a row is less than half a page long, all of it is stored locally within the page. if it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page, as described in section 15.11.2, “file space management”.
-
although
innodb
supports row sizes larger than 65,535 bytes internally, mysql itself imposes a row-size limit of 65,535 for the combined size of all columns:
mysql> create table t (a varchar(8000), b varchar(10000), -> c varchar(10000), d varchar(10000), e varchar(10000), -> f varchar(10000), g varchar(10000)) engine=innodb; error 1118 (42000): row size too large. the maximum row size for the used table type, not counting blobs, is 65535. you have to change some columns to text or blobs
-
on some older operating systems, files must be less than 2gb. this is not a limitation of
innodb
itself, but if you require a large tablespace, configure it using several smaller data files rather than one large data file. -
the combined size of the
innodb
log files can be up to 512gb. -
the minimum tablespace size is slightly larger than 10mb. the maximum tablespace size depends on the
innodb
page size.
innodb page size | maximum tablespace size |
---|---|
4kb | 16tb |
8kb | 32tb |
16kb | 64tb |
32kb | 128tb |
64kb | 256tb |
-
the maximum tablespace size is also the maximum size for a table.
-
the path of a tablespace file, including the file name, cannot exceed the
max_path
limit on windows. prior to windows 10, themax_path
limit is 260 characters. as of windows 10, version 1607,max_path
limitations are removed from common win32 file and directory functions, but you must enable the new behavior. -
the default page size in
innodb
is 16kb. you can increase or decrease the page size by configuring theinnodb_page_size
option when creating the mysql instance.32kb and 64kb page sizes are supported, but
row_format=compressed
is unsupported for page sizes greater than 16kb. for both 32kb and 64kb page sizes, the maximum record size is 16kb. forinnodb_page_size=32kb
, extent size is 2mb. forinnodb_page_size=64kb
, extent size is 4mb.a mysql instance using a particular
innodb
page size cannot use data files or log files from an instance that uses a different page size.
restrictions on innodb tables
-
analyze table
determines index cardinality (as displayed in thecardinality
column ofshow index
output) by performing random dives on each of the index trees and updating index cardinality estimates accordingly. because these are only estimates, repeated runs ofanalyze table
could produce different numbers. this makesanalyze table
fast oninnodb
tables but not 100% accurate because it does not take all rows into account.you can make the statistics collected by
analyze table
more precise and more stable by turning on theinnodb_stats_persistent
configuration option, as explained in section 15.8.10.1, “configuring persistent optimizer statistics parameters”. when that setting is enabled, it is important to runanalyze table
after major changes to indexed column data, because the statistics are not recalculated periodically (such as after a server restart).if the persistent statistics setting is enabled, you can change the number of random dives by modifying the
innodb_stats_persistent_sample_pages
system variable. if the persistent statistics setting is disabled, modify theinnodb_stats_transient_sample_pages
system variable instead.mysql uses index cardinality estimates in join optimization. if a join is not optimized in the right way, try using
analyze table
. in the few cases thatanalyze table
does not produce values good enough for your particular tables, you can useforce index
with your queries to force the use of a particular index, or set themax_seeks_for_key
system variable to ensure that mysql prefers index lookups over table scans. see section b.4.5, “optimizer-related issues”. -
if statements or transactions are running on a table, and
analyze table
is run on the same table followed by a secondanalyze table
operation, the secondanalyze table
operation is blocked until the statements or transactions are completed. this behavior occurs becauseanalyze table
marks the currently loaded table definition as obsolete whenanalyze table
is finished running. new statements or transactions (including a secondanalyze table
statement) must load the new table definition into the table cache, which cannot occur until currently running statements or transactions are completed and the old table definition is purged. loading multiple concurrent table definitions is not supported. -
show table status
does not give accurate statistics oninnodb
tables except for the physical size reserved by the table. the row count is only a rough estimate used in sql optimization. -
innodb
does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. consequently,select count(*)
statements only count rows visible to the current transaction. -
on windows,
innodb
always stores database and table names internally in lowercase. to move databases in a binary format from unix to windows or from windows to unix, create all databases and tables using lowercase names. -
an
auto_increment
columnai_col
must be defined as part of an index such that it is possible to perform the equivalent of an indexedselect max(
lookup on the table to obtain the maximum column value. typically, this is achieved by making the column the first column of some table index.ai_col
)
-
when an
auto_increment
integer column runs out of values, a subsequentinsert
operation returns a duplicate-key error. this is general mysql behavior. -
delete from
does not regenerate the table but instead deletes all rows, one by one.tbl_name
-
cascaded foreign key actions do not activate triggers.
-
you cannot create a table with a column name that matches the name of an internal
innodb
column (includingdb_row_id
,db_trx_id
,db_roll_ptr
, anddb_mix_id
). this restriction applies to use of the names in any letter case.
mysql> create table t1 (c1 int, db_row_id int) engine=innodb; error 1166 (42000): incorrect column name 'db_row_id'
locking and transactions
-
lock tables
acquires two locks on each table ifinnodb_table_locks=1
(the default). in addition to a table lock on the mysql layer, it also acquires aninnodb
table lock. versions of mysql before 4.1.2 did not acquireinnodb
table locks; the old behavior can be selected by settinginnodb_table_locks=0
. if noinnodb
table lock is acquired,lock tables
completes even if some records of the tables are being locked by other transactions.in mysql 8.0,
innodb_table_locks=0
has no effect for tables locked explicitly withlock tables ... write
. it does have an effect for tables locked for read or write bylock tables ... write
implicitly (for example, through triggers) or bylock tables ... read
. -
all
innodb
locks held by a transaction are released when the transaction is committed or aborted. thus, it does not make much sense to invokelock tables
oninnodb
tables inautocommit=1
mode because the acquiredinnodb
table locks would be released immediately. -
you cannot lock additional tables in the middle of a transaction because
lock tables
performs an implicitcommit
andunlock tables
.
转载、节选于
上一篇: 71道Android开发面试题
推荐阅读
-
InnoDB On-Disk Structures(三)--Tablespaces (转载)
-
InnoDB On-Disk Structures(四)--Doublewrite Buffer (转载)
-
InnoDB On-Disk Structures(五)-- Redo Log & Undo Logs (转载)
-
InnoDB Architecture (InnoDB In-Memory Structures 转载)
-
InnoDB On-Disk Structures--Tables (转载)
-
InnoDB On-Disk Structures(三)--Tablespaces (转载)
-
InnoDB On-Disk Structures(四)--Doublewrite Buffer (转载)
-
InnoDB On-Disk Structures(五)-- Redo Log & Undo Logs (转载)
-
InnoDB Architecture (InnoDB In-Memory Structures 转载)
-
InnoDB On-Disk Structures--Tables (转载)