XPath 路径表达式

本章定位 ：掌握 XPath 路径表达式的核心语法——绝对路径 vs 相对路径、通配符、属性选择、轴（Axis）的 13 个方向。

定义与作用

XPath 的路径表达式是一种描述"从某处出发，沿特定方向走，找到符合条件节点"的 导航语言 。

语法结构与文件系统路径相似但更强大。文件系统路径只有"上下"两个方向，XPath 有 13 个轴（Axis）定义了 13 个导航方向，可以向前、向后、向上、向下、横向查找。

核心原理：13 个轴的方向

图解释 ：XPath 的 13 个轴定义了对当前节点的所有可能导航方向。child:: 是默认轴（通常省略），descendant-or-self:: 等价于 // 缩写，attribute:: 等价于 @。

语法/结构要点

路径表达式速查

表达式	含义	示例
`/`	从根节点开始	`/bookstore/book`
`//`	从任意位置	`//title`
`.`	当前节点	`./title`
`..`	父节点	`../book`
`*`	任意元素	`/bookstore/*`
`@`	属性	`//book/@category`
`@*`	任意属性	`//book/@*`
`node()`	任意节点	`//book/node()`
`text()`	文本节点	`//title/text()`

常用轴语法

轴	缩写	含义	示例
`child::`	（默认）	子元素	`child::book` = `book`
`attribute::`	`@`	属性	`attribute::category` = `@category`
`parent::`	`..`	父节点	`parent::*` = `..`
`descendant-or-self::`	`//`	后代或自身	`descendant-or-self::node()/book` = `//book`
`self::`	`.`	自身	`self::node()` = `.`

完整示例：黄俪用轴定位测试数据

场景说明

飞翔科技的测试工程师黄俪需要验证一个复杂的 XML 测试夹具。她要找到"与某个特定节点有特定关系的其他节点"——比如某个 <error> 节点后面的所有 <warning> 兄弟节点。

XML 数据

<test-report>
  <suite name="XML验证">
    <case id="TC001" status="pass">元素声明</case>
    <case id="TC002" status="fail">
      <error>属性缺失</error>
    </case>
    <case id="TC003" status="pass">命名空间</case>
    <case id="TC004" status="fail">
      <error>类型不匹配</error>
      <warning>建议使用 XSD</warning>
    </case>
    <case id="TC005" status="pass">实体引用</case>
  </suite>
</test-report>

XPath 轴操作

from lxml import etree

tree = etree.parse("test_report.xml")

# child:: — 默认轴，选 suite 的直接子 case
cases = tree.xpath("/test-report/suite/case")
print(f"总测试数: {len(cases)}")  # 5

# descendant:: — 查所有后代中的 error 元素
errors = tree.xpath("//suite/descendant::error")
print(f"错误数: {len(errors)}")  # 2

# following-sibling:: — 查 TC002 之后的兄弟 case
after_tc002 = tree.xpath(
    "//case[@id='TC002']/following-sibling::case"
)
print(f"TC002 之后的用例: {[c.get('id') for c in after_tc002]}")
# ['TC003', 'TC004', 'TC005']

# ancestor:: — 查 error 的祖先
ancestors = tree.xpath("//error[1]/ancestor::*")
print(f"祖先元素: {[a.tag for a in ancestors]}")
# ['case', 'suite', 'test-report']

# preceding-sibling:: — TC004 之前的兄弟
before = tree.xpath("//case[@id='TC004']/preceding-sibling::case")
print(f"TC004 之前的用例: {[c.get('id') for c in before]}")
# ['TC001', 'TC002', 'TC003']

操作结果

黄俪用寥寥几行 XPath 就完成了"查祖先→查兄弟→查后代"的遍历。如果手写 DOM 循环，每个查询都需要十几行代码。

易错场景

错误一：`//` 性能陷阱

<!-- ❌ 从文档根扫描所有节点 -->
//book

<!-- ✅ 有上下文时用相对路径 -->
child::book

// 从文档根开始遍历所有后代节点，在大文档中开销巨大。如果有明确的上下文节点，优先用相对路径。

错误二：`/` 和 `//` 的路径起点

/bookstore/book → 从根节点开始，一层一层走
//book → 忽略层级，找所有 book
/bookstore//book → 从 bookstore 开始，在其后代中找所有 book

面试考点

考点	参考答案要点
XPath 有哪些常用轴？各有什么作用？	child（子）、parent（父）、ancestor（祖先）、descendant（后代）、following-sibling（后续兄弟）、preceding-sibling（前面兄弟）、attribute（属性）
`//` 和 `/` 的本质区别？	`/` 是绝对路径从根开始；`//` 等于 descendant-or-self::node()/，选择任意深度的匹配节点
`@` 的作用是什么？	`@` 是 attribute:: 轴的缩写，用于选择属性节点。例如 `@category` 或 `//book/@category`