复合索引

导学

单列索引只能加速单列查询，但真实业务中 WHERE 条件往往包含多列。复合索引（多列索引）让索引设计更贴合实际查询需求，但列的顺序至关重要——顺序错了，索引可能完全失效。

定义

复合索引（Composite Index / 联合索引）：在多个列上建立的索引。InnoDB 按照索引列的顺序依次排序，查询时必须遵循"最左前缀原则"才能有效利用索引。

最左前缀原则

复合索引 (a, b, c) 等效于建立了三个索引：

(a)
(a, b)
(a, b, c)

但不包含 (b)、(c)、(b, c)。

SQL 示例

场景一：创建复合索引

当前数据状态：创建演示表

CREATE TABLE employees (
    emp_id INT PRIMARY KEY AUTO_INCREMENT,
    emp_name VARCHAR(20),
    dept VARCHAR(20),
    score DECIMAL(5,2)
);

INSERT INTO employees (emp_name, dept, score) VALUES
('大翔', '技术部', 100),
('白歌', '技术部', NULL);

SELECT * FROM employees;

emp_id	emp_name	dept	score
1	大翔	技术部	100
2	白歌	技术部	NULL

操作语句：

-- 在 (dept, emp_name, score) 三列上创建复合索引
CREATE INDEX idx_dept_name_score ON employees(dept, emp_name, score);

SHOW INDEX FROM employees;

操作后数据状态（节选）：

Table	Non_unique	Key_name	Seq_in_index	Column_name
employees	1	idx_dept_name_score	1	dept
employees	1	idx_dept_name_score	2	emp_name
employees	1	idx_dept_name_score	3	score

结果解读：Seq_in_index 显示索引列的顺序：先按 dept 排序，dept 相同再按 emp_name 排序，emp_name 相同再按 score 排序。

场景二：完全匹配最左前缀（最优）

当前数据状态：复合索引 idx_dept_name_score 已创建。

操作语句：

-- 使用索引的全部三列
SELECT * FROM employees WHERE dept = '技术部' AND emp_name = '大翔' AND score = 100;

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100

EXPLAIN SELECT * FROM employees WHERE dept = '技术部' AND emp_name = '大翔' AND score = 100;

+----+-------------+-----------+------+----------------------+----------------------+---------+-------------------+------+-------+
| id | select_type | table     | type | possible_keys        | key                  | key_len | ref               | rows | Extra |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------------------+------+-------+
|  1 | SIMPLE      | employees | ref  | idx_dept_name_score  | idx_dept_name_score  | 93      | const,const,const |    1 | NULL  |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------------------+------+-------+

结果解读：key=idx_dept_name_score 表示使用了复合索引，ref=const,const,const 表示三个列都用了等值匹配，这是最优的索引使用情况。

场景三：匹配最左列（部分使用）

当前数据状态：同上。

操作语句：

-- 只使用索引的 dept 部分
SELECT * FROM employees WHERE dept = '技术部';

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100
2	白歌	技术部	NULL

EXPLAIN SELECT * FROM employees WHERE dept = '技术部';

+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------+
| id | select_type | table     | type | possible_keys        | key                  | key_len | ref   | rows | Extra |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------+
|  1 | SIMPLE      | employees | ref  | idx_dept_name_score  | idx_dept_name_score  | 63      | const |    2 | NULL  |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------+

结果解读：key_len=63 比完全匹配时的 93 小，说明只使用了索引的第一列 dept。虽然只用了部分索引，但仍然有效，比全表扫描快得多。

场景四：匹配前两列（部分使用）

当前数据状态：同上。

操作语句：

-- 使用索引的 dept 和 emp_name 部分
SELECT * FROM employees WHERE dept = '技术部' AND emp_name = '大翔';

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100

EXPLAIN SELECT * FROM employees WHERE dept = '技术部' AND emp_name = '大翔';

+----+-------------+-----------+------+----------------------+----------------------+---------+-------------+------+-------+
| id | select_type | table     | type | possible_keys        | key                  | key_len | ref         | rows | Extra |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------------+------+-------+
|  1 | SIMPLE      | employees | ref  | idx_dept_name_score  | idx_dept_name_score  | 93      | const,const |    1 | NULL  |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------------+------+-------+

结果解读：key_len=93 表示使用了前两列（dept 和 emp_name），ref=const,const 确认了两列都用于等值匹配。

场景五：跳过最左列（索引失效）

当前数据状态：同上。

操作语句：

-- 不使用最左列 dept，索引完全失效
SELECT * FROM employees WHERE emp_name = '大翔';

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100

EXPLAIN SELECT * FROM employees WHERE emp_name = '大翔';

+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | employees | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+

结果解读：type=ALL 表示全表扫描，possible_keys=NULL 表示没有可用索引。因为查询条件跳过了最左列 dept，复合索引 idx_dept_name_score 完全无法使用。

场景六：最左列用范围，右侧列失效

当前数据状态：同上。

操作语句：

-- dept 用范围查询，emp_name 无法使用索引
SELECT * FROM employees WHERE dept LIKE '技%' AND emp_name = '大翔';

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100

EXPLAIN SELECT * FROM employees WHERE dept LIKE '技%' AND emp_name = '大翔';

+----+-------------+-----------+-------+----------------------+----------------------+---------+------+------+-----------------------+
| id | select_type | table     | type  | possible_keys        | key                  | key_len | ref  | rows | Extra                 |
+----+-------------+-----------+-------+----------------------+----------------------+---------+------+------+-----------------------+
|  1 | SIMPLE      | employees | range | idx_dept_name_score  | idx_dept_name_score  | 63      | NULL |    2 | Using index condition |
+----+-------------+-----------+-------+----------------------+----------------------+---------+------+------+-----------------------+

结果解读：type=range 表示使用了范围扫描，key_len=63 说明只使用了 dept 列。emp_name = '大翔' 这个条件只能在索引过滤后的结果中再检查（Using index condition 表示索引下推）。

⚠️ 关键规则：复合索引中，遇到范围查询（>、<、BETWEEN、LIKE）的列，其右侧的列不再走索引。

场景七：最左列存在但中间缺失

当前数据状态：同上。

操作语句：

-- 有 dept，跳过了 emp_name，只用 score
SELECT * FROM employees WHERE dept = '技术部' AND score = 100;

操作后数据状态：

emp_id	emp_name	dept	score
1	大翔	技术部	100

EXPLAIN SELECT * FROM employees WHERE dept = '技术部' AND score = 100;

+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------+
| id | select_type | table     | type | possible_keys        | key                  | key_len | ref   | rows | Extra                 |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | employees | ref  | idx_dept_name_score  | idx_dept_name_score  | 63      | const |    2 | Using index condition |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------+

结果解读：只使用了 dept 列（key_len=63），score 列没有使用索引。因为 emp_name 缺失，索引在 dept 之后就断开了，score 无法被利用。

索引列顺序的设计原则

原则一：等值查询列优先

-- 查询条件：WHERE user_id = ? AND status = ? AND created_at > ?
-- 推荐索引：(user_id, status, created_at)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    user_id INT,
    status VARCHAR(20),
    created_at DATETIME,
    amount DECIMAL(10, 2)
);

INSERT INTO orders VALUES
(1, 100, 'paid', '2024-01-01 10:00:00', 99.00),
(2, 100, 'paid', '2024-01-02 11:00:00', 199.00),
(3, 101, 'pending', '2024-01-03 12:00:00', 59.00),
(4, 100, 'shipped', '2024-01-04 13:00:00', 299.00);

CREATE INDEX idx_user_status_time ON orders(user_id, status, created_at);

-- 最优查询：三列都用上
SELECT * FROM orders WHERE user_id = 100 AND status = 'paid' AND created_at > '2024-01-01';

order_id	user_id	status	created_at	amount
1	100	paid	2024-01-01 10:00:00	99.00
2	100	paid	2024-01-02 11:00:00	199.00

结果解读：user_id 和 status 是等值查询，放在前面；created_at 是范围查询，放在最后。这样三列都能被索引利用。

原则二：高选择性列优先

选择性 = 不同值数量 / 总行数。选择性越高，索引过滤效果越好。

-- 假设 employees 表有 1000 条记录
-- dept 有 10 个不同值，选择性 = 0.01
-- emp_name 有 50 个不同值，选择性 = 0.05

-- 如果查询总是同时查两者，把选择性高的放前面
-- 但最左前缀原则更重要！如果查询总是 WHERE dept = ?，则 dept 必须放最前面

覆盖索引与复合索引

-- 索引 (dept, emp_name, score)
SELECT dept, emp_name, score FROM employees WHERE dept = '技术部';

操作后数据状态：

dept	emp_name	score
技术部	大翔	100
技术部	白歌	NULL

EXPLAIN SELECT dept, emp_name, score FROM employees WHERE dept = '技术部';

+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table     | type | possible_keys        | key                  | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | employees | ref  | idx_dept_name_score  | idx_dept_name_score  | 63      | const |    2 | Using index |
+----+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-------------+

结果解读：Extra=Using index 表示使用了覆盖索引。查询的所有列（dept, emp_name, score）都在 idx_dept_name_score 索引中，无需回表查聚簇索引。复合索引极大地增加了覆盖索引的可能性。

常见误区

误区	正解
复合索引包含的列越多越好	索引列越多，索引越大，维护成本越高。应只为查询条件、排序、覆盖 SELECT 列设计索引。
`WHERE a=1 OR b=2` 能用 (a,b) 索引	MySQL 5.7 的索引合并可能同时使用两个单列索引，但复合索引 `(a,b)` 对此查询无帮助。
改变列顺序没关系	顺序决定最左前缀，顺序错误导致索引完全失效。

面试考点

Q：什么是复合索引的最左前缀原则？

复合索引 (a,b,c) 查询时必须包含最左列 a 的条件，才能使用索引。条件 a=1 AND b=2 用 (a,b) 部分；b=2 完全不用索引。遇到范围查询后，右侧列不再走索引。

Q：如何设计一个复合索引？

等值查询列放在前面；2. 排序列放中间；3. 范围查询列放最后；4. 考虑覆盖索引，把 SELECT 中的列也纳入索引（权衡维护成本）。

Q：WHERE a=1 AND b>2 AND c=3，索引 (a,b,c) 能用几列？

用 a 和 b 两列。b>2 是范围查询，c 无法使用索引。若要 c 也走索引，需调整查询或索引为 (a,c,b)（如果 c 是等值查询）。

小结

复合索引在多个列上建立，遵循最左前缀原则
等值查询列放前面，范围查询列放最后
索引列顺序错误会导致索引完全失效
复合索引更容易实现覆盖索引，避免回表

下一章引子：索引加速查询，而外键则确保表与表之间的数据一致性。