索引优化

导学

索引是数据库查询优化的核心手段。但"加索引"不是万能药——加错索引可能比不加更慢。本节从索引选择、维护、监控三个维度，系统讲解索引优化方法。

定义

索引优化：通过合理设计、创建和维护索引，使查询能够高效利用索引，减少数据扫描量，从而提升查询性能的过程。

索引设计原则

原则一：为 WHERE、JOIN、ORDER BY 的列建索引

-- 建表
CREATE TABLE employees (
    emp_id INT PRIMARY KEY AUTO_INCREMENT,
    emp_name VARCHAR(20),
    dept VARCHAR(20),
    score DECIMAL(5,2),
    INDEX idx_dept (dept)
);

-- 插入测试数据
INSERT INTO employees (emp_name, dept, score) VALUES
('大翔', '技术部', 100),
('白歌', '技术部', NULL);

当前数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100
2	白歌	技术部	NULL

示例：复合索引设计

查询需求：

SELECT * FROM employees 
WHERE dept = '技术部' 
ORDER BY score DESC;

当前执行计划（无合适索引）：

EXPLAIN SELECT * FROM employees 
WHERE dept = '技术部' 
ORDER BY score DESC;

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ref	idx_dept	idx_dept	63	const	2	Using where; Using filesort

结果解读：

使用了 idx_dept，但需要 Using filesort 额外排序
score 的排序只在 Using where 中过滤，没有利用索引

优化：创建复合索引

CREATE INDEX idx_dept_score ON employees(dept, score);

-- 再次查看执行计划
EXPLAIN SELECT * FROM employees 
WHERE dept = '技术部' 
ORDER BY score DESC;

优化后的执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ref	idx_dept,idx_dept_score	idx_dept_score	63	const	2	Using index condition

结果解读：

key = idx_dept_score：使用了新创建的复合索引
ref = const：等值条件使用了索引
Extra 中 Using filesort 消失：因为 score 在索引中，排序直接在索引完成
key_len = 63：dept 列的索引长度

原则二：选择性高的列优先

选择性 = 不同值数量 / 总行数。越接近 1，索引过滤效果越好。

-- 准备测试数据
CREATE TABLE employees (
    emp_id INT PRIMARY KEY AUTO_INCREMENT,
    emp_name VARCHAR(20),
    dept VARCHAR(20),
    score DECIMAL(5,2)
);

INSERT INTO employees (emp_name, dept, score) VALUES
('大翔', '技术部', 100),
('白歌', '技术部', NULL);

-- 查看各列的选择性
SELECT 
    COUNT(DISTINCT dept) / COUNT(*) AS dept_selectivity,
    COUNT(DISTINCT emp_name) / COUNT(*) AS name_selectivity,
    COUNT(*) AS total_rows
FROM employees;

查询结果：

dept_selectivity	name_selectivity	total_rows
0.5000	1.0000	2

结果解读：

dept 选择性约 0.5（只有技术部 1 种值），索引过滤效果差
emp_name 选择性为 1.0（每行都不同），索引过滤效果好
应优先给 emp_name 建索引，而不是 dept

-- 为选择性高的列创建索引
CREATE INDEX idx_emp_name ON employees(emp_name);

-- dept 选择性太低，单独建索引意义不大
-- 但如果经常用于 WHERE dept = '技术部' AND emp_name = '大翔'，可以作为复合索引的前缀

原则三：避免对频繁修改的列建索引

CREATE TABLE employees (
    emp_id INT PRIMARY KEY AUTO_INCREMENT,
    emp_name VARCHAR(20),
    dept VARCHAR(20),
    score DECIMAL(5,2),
    INDEX idx_score (score)
);

INSERT INTO employees (emp_name, dept, score) VALUES
('大翔', '技术部', 100),
('白歌', '技术部', NULL);

当前数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100
2	白歌	技术部	NULL

频繁更新的影响演示：

-- 如果 score 有索引，每次更新都需要维护索引
-- UPDATE employees SET score = score + 1 WHERE emp_id = 1;
-- 这条 UPDATE 会同时更新数据行和 score 索引

-- 查看索引维护开销的估算（通过执行计划）
EXPLAIN UPDATE employees SET score = score + 1 WHERE emp_id = 1;

结果解读：

对 score 这种频繁更新的列建索引，每次 UPDATE 都需要更新索引树
如果每秒有上千次更新，索引维护开销会严重影响写入性能
建议：对写多读少的列，谨慎建索引

原则四：控制单表索引数量

单表索引不宜过多（建议不超过 5-6 个）。每个索引都会：

占用磁盘空间
增加写入开销
增加优化器选择成本

-- 查看表的所有索引
SHOW INDEX FROM employees;

当前索引状态：

Table	Non_unique	Key_name	Seq_in_index	Column_name
employees	0	PRIMARY	1	emp_id
employees	1	idx_dept	1	dept
employees	1	idx_dept_score	1	dept
employees	1	idx_dept_score	2	score
employees	1	idx_emp_name	1	emp_name
employees	1	idx_score	1	score

结果解读：

employees 表目前有 4 个二级索引
注意：idx_dept 和 idx_dept_score 的前缀都是 dept，idx_dept 完全冗余（详见场景一）
如果索引超过 5-6 个，需要审视是否有重复或冗余索引

SQL 示例

场景一：识别重复索引

-- 回顾原则一中 employees 表的索引
SHOW INDEX FROM employees;

索引状态：

Table	Key_name	Column_name	Cardinality
employees	PRIMARY	emp_id	2
employees	idx_dept	dept	1
employees	idx_dept_score	dept	1
employees	idx_dept_score	score	1
employees	idx_emp_name	emp_name	2
employees	idx_score	score	1

结果解读：

idx_dept 是单列索引，只包含 dept
idx_dept_score 是复合索引，包含 (dept, score)
由于复合索引的前缀 dept 可以单独用于 WHERE dept = ? 的查询，idx_dept 完全冗余

-- 删除冗余索引
DROP INDEX idx_dept ON employees;

-- 验证删除后的索引
SHOW INDEX FROM employees;

删除后的索引状态：

Table	Key_name	Column_name
employees	PRIMARY	emp_id
employees	idx_dept_score	dept
employees	idx_dept_score	score
employees	idx_emp_name	emp_name
employees	idx_score	score

结果解读：

删除 idx_dept 后，WHERE dept = ? 的查询仍然可以使用 idx_dept_score 的前缀
节省了索引维护开销和磁盘空间

场景二：识别未使用的索引

MySQL 5.7 可通过 sys.schema_unused_indexes 查看从未被使用的索引：

-- 查看某个库中从未使用的索引
SELECT * FROM sys.schema_unused_indexes
WHERE object_schema = 'library';

示例输出：

object_schema	object_name	index_name
library	employees	idx_score
library	employees	idx_birth_date

结果解读：

idx_score 自创建以来从未被查询使用
idx_birth_date 也从未被使用
这些索引可以安全删除，减少维护开销

场景三：更新索引统计信息

-- 先查看索引 Cardinality（基数，即不同值的数量估算）
SHOW INDEX FROM employees;

当前状态：

Table	Key_name	Column_name	Cardinality
employees	PRIMARY	emp_id	2
employees	idx_dept_score	dept	1
employees	idx_emp_name	emp_name	2

-- 插入大量新数据后，统计信息可能不准确
INSERT INTO employees (emp_name, dept, score) VALUES
('小崔', '技术部', 85),
('黄俪', '市场部', 90);

-- 更新单表统计（ANALYZE 不锁表，很快）
ANALYZE TABLE employees;

-- 再次查看索引统计
SHOW INDEX FROM employees;

更新后的状态：

Table	Key_name	Column_name	Cardinality
employees	PRIMARY	emp_id	4
employees	idx_dept_score	dept	2
employees	idx_emp_name	emp_name	4

结果解读：

ANALYZE TABLE 更新了索引的统计信息（如 Cardinality）
优化器依赖这些统计信息选择执行计划，不准确的统计可能导致选错索引
ANALYZE TABLE 不锁表，适合在线执行

-- OPTIMIZE TABLE 会重建表和索引，整理碎片，但会锁表
-- OPTIMIZE TABLE employees;  -- 大表执行时间长，谨慎使用

场景四：强制使用/忽略索引

-- 准备测试数据
CREATE TABLE force_index_demo (
    id INT PRIMARY KEY,
    col_a VARCHAR(50),
    col_b VARCHAR(50),
    INDEX idx_a (col_a),
    INDEX idx_b (col_b)
);

INSERT INTO force_index_demo VALUES
(1, 'a1', 'b1'),
(2, 'a2', 'b2'),
(3, 'a3', 'b3');

-- 正常情况下，优化器自动选择索引
EXPLAIN SELECT * FROM force_index_demo WHERE col_a = 'a1';

正常执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	force_index_demo	ref	idx_a	idx_a	153	const	1	Using index condition

-- 强制使用某索引（不推荐常规使用，排查问题时可用）
EXPLAIN SELECT * FROM force_index_demo USE INDEX (idx_b) WHERE col_a = 'a1';

强制使用 idx_b 的执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	force_index_demo	ALL	idx_b	NULL	NULL	NULL	3	Using where

结果解读：

USE INDEX (idx_b) 强制优化器只考虑 idx_b
但 idx_b 不包含 col_a 列，所以最终走了全表扫描（type = ALL）
这说明强制使用索引可能导致更差的执行计划

-- 忽略某索引
EXPLAIN SELECT * FROM force_index_demo IGNORE INDEX (idx_a) WHERE col_a = 'a1';

忽略 idx_a 的执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	force_index_demo	ALL	NULL	NULL	NULL	NULL	3	Using where

结果解读：

IGNORE INDEX (idx_a) 让优化器不考虑 idx_a
没有其他可用索引，最终全表扫描

索引失效的常见原因

原因	示例	解决方案
对索引列使用函数	`WHERE LENGTH(emp_name) = 2`	改写为等值查询 `WHERE emp_name = '大翔'`
隐式类型转换	`WHERE dept = 1`（dept 是字符串）	加引号：`WHERE dept = '技术部'`
前导 % 的 LIKE	`WHERE emp_name LIKE '%翔'`	改为前缀匹配或全文检索
OR 条件不当	`WHERE dept = '技术部' OR score = 100`	改写为 UNION，或确保 OR 的每列都有索引
NOT、<>、IS NOT NULL	`WHERE score <> 0`	选择性高时仍可能走索引，但效率低于等值查询
计算表达式	`WHERE emp_id + 1 = 2`	改写为 `WHERE emp_id = 1`

示例：索引失效演示

-- 建表
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    emp_name VARCHAR(20),
    dept VARCHAR(20),
    score DECIMAL(5,2),
    INDEX idx_name (emp_name),
    INDEX idx_dept (dept)
);

INSERT INTO employees VALUES
(1, '大翔', '技术部', 100),
(2, '白歌', '技术部', NULL);

失效原因 1：对索引列使用函数

-- 索引失效：对 emp_name 使用 LENGTH 函数
EXPLAIN SELECT * FROM employees WHERE LENGTH(emp_name) = 2;

执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ALL	NULL	NULL	NULL	NULL	2	Using where

优化改写：

-- 改写为等值查询，可以使用索引
EXPLAIN SELECT * FROM employees WHERE emp_name = '大翔';

优化后的执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ref	idx_name	idx_name	63	const	1	Using index condition

结果解读：

LENGTH(emp_name) 导致索引失效，全表扫描
改写为等值查询后，type = ref，使用了 idx_name 索引

失效原因 2：隐式类型转换

-- dept 是 VARCHAR 类型，但传入数字
EXPLAIN SELECT * FROM employees WHERE dept = 1;

执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ALL	idx_dept	NULL	NULL	NULL	2	Using where

优化改写：

-- 加引号，避免隐式类型转换
EXPLAIN SELECT * FROM employees WHERE dept = '技术部';

优化后的执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ref	idx_dept	idx_dept	63	const	1	Using index condition

结果解读：

dept = 1（数字）导致 MySQL 将 dept 列转换为数字进行比较，索引失效
dept = '技术部'（字符串）正确使用索引

失效原因 3：前导 % 的 LIKE

-- 前导 % 导致索引失效
EXPLAIN SELECT * FROM employees WHERE emp_name LIKE '%翔';

执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	ALL	NULL	NULL	NULL	NULL	2	Using where

-- 后缀 % 可以使用索引
EXPLAIN SELECT * FROM employees WHERE emp_name LIKE '大%';

执行计划：

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	employees	range	idx_name	idx_name	63	NULL	1	Using index condition

结果解读：

LIKE '%翔' 前导通配符无法使用索引，因为索引是按前缀排序的
LIKE '大%' 可以使用索引进行范围扫描

常见误区

误区	正解
索引越多越好	索引有维护成本，过多索引会拖慢写入。
建了索引查询就一定快	数据量很小时，全表扫描可能比索引快；索引失效时也不走索引。
索引可以解决所有性能问题	索引只解决"查找"问题。计算密集型、网络传输型、锁竞争型问题需要其他方案。

面试考点

Q：什么情况下索引会失效？

对索引列使用函数；2. 隐式类型转换；3. 前导 % 的 LIKE；4. OR 条件中某列无索引；5. 范围查询右侧列；6. 计算表达式；7. 数据量极小优化器选择全表扫描。

Q：如何知道一个查询有没有走索引？

用 EXPLAIN 查看 key 列和 type 列。key 不为 NULL 表示走了索引；type 为 ALL 表示全表扫描。

Q：ANALYZE TABLE 和 OPTIMIZE TABLE 的区别？

ANALYZE TABLE 只更新索引统计信息，不锁表，很快；OPTIMIZE TABLE 重建表和索引，整理碎片，会锁表，大表执行时间长。

小结

索引优化是查询性能优化的核心手段
为 WHERE、JOIN、ORDER BY 的列建索引，选择性高的列优先
避免重复索引、控制单表索引数量
定期执行 ANALYZE TABLE 更新统计信息
了解索引失效的常见原因，避免写出索引失效的 SQL

下一章引子：索引优化是硬件层面的提速，而 SQL 写法优化则是逻辑层面的提速——有时改写一条 SQL，比加十个索引更有效。