HPM Note5, Query Performance Optimization 博客分类: Database performanceMySQLSQL Serverthreadmemcached
程序员文章站
2024-02-22 16:49:46
...
Slow Query Basics: Optimize Data Access
Analyze a poorly performing query in two steps:
1, Find out whether your application is retrieving more data than you need
2, Find out whether the MySQL server is analyzing more rows than it needs
Typical mistakes:
1, Fetching more rows than needed
2, Fetching all columns from a multitable join
3, Fetching all columsn
The simplest query cost metrics in MySQL(Slow query log):
1, Execution time
2, Number of rows examined
3, Number of rows returned
The ratio of rows examined to rows returned is usually small-say, between 1:1 and 10:1-but sometimes it can be orders of magnitude larger
The access methods apprear in the type column in EXPLAIN's output
The access types range from a full table scan to index scans, range scans, unique index lookups and constants
If you find that a huge number of rows were examined to produce relatively few rows in the result, you can try some more sophisticated fixes:
1, Use covering indexes, which store data so that the storage engine doesn't have to retrieve the complete rows
2, Change the schema. An example is using summary tables
3, Rewrite a complicated query so the MySQL optimizer is able to execute it optimally
Ways to Restructure Queries
Sometimes you can make a query more efficient by decomposing it and executing a few simple queries instead of one complex one
Chopping up the DELETE statement and using medium-size queries can improve performance and reduce replication lag considerably
Many high-performance web sites use join decomposition
You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application
Doing joins in the application may be more efficient when:
1, You cache and reuse a lot of data from earlier queries
2, You use multiple MyISAM tables
3, You distribute data across multiple servers
4, You replace joins with IN() lists on large tables
5, A join refers to the same table multiple times
Query Execution Basics
What happens when you send MySQL a query:
1, The client sends the SQL statement to the server
2, The server checks the query cache
3, The server parses, preprocesses, and optimizes the SQL into a query execution plan
4, The query execution engine executes the plan by making calls to the storage engine API
5, The server sends the result to the client
The MySQL Client/Server Protocol is half duplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both
It also means there is no way to cut a message short
There's no flow control; once oneside sends a message, the other side must fetch the entire message before responding
When the server responds, the client has to receive the entire result set
You can use SHOW FULL PROCESSLIST to view MySQL connection or thread state:
1, Sleep
The thread is waiting for a new query from the client
2, Query
The thread is either executing the query or sending the result back to the client
3, Locked
The thread is waiting for a table lock to be granted at the server level
4, Analyzing and statistics
The thread is checking storage engine statistics and optimizing the query
5, Copying to tmp table [on disk]
The thread is processing the query and copying results to a temporary table, probably for a GROUP BY, for a filesort, or to satisfy a UNION
If the state ends with "on disk", MySQL is converting an in-memory table to an on-disk table
6, Sorting result
The thread is sorting a result set
7, Sending data
This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client
Some types of optimizations MySQL knows how to do:
1, Reordering joins
2, Converting OUTER JOINS to INNER JOINS
3, Applying algebraic equivalence rules
4, COUNT(), MIN(), and MAX() optimizations
5, Evaluating and reducing constant expressiosn
6, Covering indexes
7, Subquery optimization
8, Early termination
9, Equality propagation
10, IN() list comparisons
// to-do
// Have a thorough understanding of MySQL parser/optimizer
Optimizing Specific Types of Queries
COUNT
Use covering index, Summary tables or external caching system such as memcached
JOIN
1, Make sure there are indexes on the columns in the ON or USING clauses
2, Try to ensure that any GROUP BY or ORDER BY expression refers only to columns from a single table
3, Be careful when upgrading MySQL, because the join syntax, operator precedence, and other behaviors have changed at various times
Subqueries
You should usually prefer a join where possible
GROUP BY and DISTINCT
WITH ROLLUP or do superaggregation in your application
LIMIT and OFFSET
Covering index
Convert the limit to a positional query by precalculate and index a position column
Precomputed summaries
Sphinx
UNION
Always use UNION ALL
Analyze a poorly performing query in two steps:
1, Find out whether your application is retrieving more data than you need
2, Find out whether the MySQL server is analyzing more rows than it needs
Typical mistakes:
1, Fetching more rows than needed
2, Fetching all columns from a multitable join
3, Fetching all columsn
The simplest query cost metrics in MySQL(Slow query log):
1, Execution time
2, Number of rows examined
3, Number of rows returned
The ratio of rows examined to rows returned is usually small-say, between 1:1 and 10:1-but sometimes it can be orders of magnitude larger
The access methods apprear in the type column in EXPLAIN's output
The access types range from a full table scan to index scans, range scans, unique index lookups and constants
If you find that a huge number of rows were examined to produce relatively few rows in the result, you can try some more sophisticated fixes:
1, Use covering indexes, which store data so that the storage engine doesn't have to retrieve the complete rows
2, Change the schema. An example is using summary tables
3, Rewrite a complicated query so the MySQL optimizer is able to execute it optimally
Ways to Restructure Queries
Sometimes you can make a query more efficient by decomposing it and executing a few simple queries instead of one complex one
Chopping up the DELETE statement and using medium-size queries can improve performance and reduce replication lag considerably
// Bad mysql > DELETE FROM messages WHERE created < DATE_SUB(NOW(), INTERVAL 3 MONTH); // Good rows_affected = 0 do { rows_affected = do_query ( "DELETE FROM messages WHERE created < DATE_SUB(NOW(), INTERVAL 3 MONTH) LIMIT 10000" ) } while rows_affected > 0
Many high-performance web sites use join decomposition
You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application
// Bad SELECT * FROM tag JOIN tag_post ON tag_post.tag_id = tag.id JOIN post ON tag_post.post_id = post.id WHERE tag.tag = 'mysql' //Good SELECT * FROM tag WHERE tag='mysql' SELECT * FROM tag_post WHERE tag_id = 1234 SELECT * FROM post WHERE post.id in (123,456,567,9098,8904)
Doing joins in the application may be more efficient when:
1, You cache and reuse a lot of data from earlier queries
2, You use multiple MyISAM tables
3, You distribute data across multiple servers
4, You replace joins with IN() lists on large tables
5, A join refers to the same table multiple times
Query Execution Basics
What happens when you send MySQL a query:
1, The client sends the SQL statement to the server
2, The server checks the query cache
3, The server parses, preprocesses, and optimizes the SQL into a query execution plan
4, The query execution engine executes the plan by making calls to the storage engine API
5, The server sends the result to the client
The MySQL Client/Server Protocol is half duplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both
It also means there is no way to cut a message short
There's no flow control; once oneside sends a message, the other side must fetch the entire message before responding
When the server responds, the client has to receive the entire result set
You can use SHOW FULL PROCESSLIST to view MySQL connection or thread state:
1, Sleep
The thread is waiting for a new query from the client
2, Query
The thread is either executing the query or sending the result back to the client
3, Locked
The thread is waiting for a table lock to be granted at the server level
4, Analyzing and statistics
The thread is checking storage engine statistics and optimizing the query
5, Copying to tmp table [on disk]
The thread is processing the query and copying results to a temporary table, probably for a GROUP BY, for a filesort, or to satisfy a UNION
If the state ends with "on disk", MySQL is converting an in-memory table to an on-disk table
6, Sorting result
The thread is sorting a result set
7, Sending data
This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client
Some types of optimizations MySQL knows how to do:
1, Reordering joins
2, Converting OUTER JOINS to INNER JOINS
3, Applying algebraic equivalence rules
4, COUNT(), MIN(), and MAX() optimizations
5, Evaluating and reducing constant expressiosn
6, Covering indexes
7, Subquery optimization
8, Early termination
9, Equality propagation
10, IN() list comparisons
// to-do
// Have a thorough understanding of MySQL parser/optimizer
Optimizing Specific Types of Queries
COUNT
// Bad SELECT COUNT(*) FROM world.City WHERE ID > 5; // Good SELECT (SELECT COUNT(*) FROM world.City) - COUNT(*) FROM world.City WHERE ID <= 5;
Use covering index, Summary tables or external caching system such as memcached
JOIN
1, Make sure there are indexes on the columns in the ON or USING clauses
2, Try to ensure that any GROUP BY or ORDER BY expression refers only to columns from a single table
3, Be careful when upgrading MySQL, because the join syntax, operator precedence, and other behaviors have changed at various times
Subqueries
You should usually prefer a join where possible
GROUP BY and DISTINCT
WITH ROLLUP or do superaggregation in your application
LIMIT and OFFSET
Covering index
Convert the limit to a positional query by precalculate and index a position column
Precomputed summaries
Sphinx
UNION
Always use UNION ALL