文本差异对比
并排比较两个文本,逐行高亮显示差异
文本 A
文本 B
什么是文本 Diff?
文本 diff(difference 的缩写)是对两段文本进行比较后得出的结果,用于识别哪些行被添加、删除或保持不变。这一概念源自 Unix diff 工具,该工具于 1974 年作为 Unix 第 5 版的一部分首次发布。如今,文本 diff 已成为 Git 等版本控制系统的核心机制——每次提交存储的是 diff,而非每个文件的完整副本。
diff 算法通过寻找两个行序列之间的最长公共子序列(LCS)来工作。属于 LCS 的行标记为未更改;存在于原始文本但不在 LCS 中的行标记为已删除;存在于修改后文本但不在 LCS 中的行标记为已添加。最终得到将一段文本转换为另一段所需的最小变更集。
diff 输出有多种格式。统一 diff(unified diff,git diff 的默认格式)用减号前缀标记已删除行,用加号前缀标记已添加行。并排 diff 将两段文本排列在平行列中。本工具采用逐行比较并以颜色编码输出:绿色表示新增,红色表示删除,中性色表示未更改。未更改的行默认不带前缀显示,也可以隐藏,以便只关注变更内容。
为什么使用在线文本 Diff 工具?
在终端中比较文本需要安装 diff 工具并处理命令行参数。基于浏览器的 diff 工具完全消除了这些障碍。
文本 Diff 使用场景
Diff 输出格式对比
Diff 工具会以多种格式产生输出。下表汇总了最常见的几种格式、生成工具及适用场景。
| 格式 | 工具 / 来源 | 说明 |
|---|---|---|
| Unified diff | diff -u / git diff | Prefixes lines with + / - / space; includes @@ hunk headers |
| Side-by-side | diff -y / sdiff | Two columns, changed lines aligned horizontally |
| Context diff | diff -c | Shows changed lines with surrounding context, marked with ! / + / - |
| HTML diff | Python difflib | Color-coded HTML table with inline change highlights |
| JSON Patch | RFC 6902 | Array of add/remove/replace operations on a JSON document |
逐行 Diff 的工作原理:LCS 算法
大多数逐行 diff 工具(包括本工具)使用最长公共子序列(LCS)算法。LCS 找出在两段文本中以相同相对顺序出现的最大行集合,无需要求这些行连续。不在 LCS 中的行即为实际差异。
标准 LCS 算法使用动态规划,时间复杂度为 O(m x n),其中 m 和 n 分别是两段文本的行数。对于大文件,Myers' diff 算法(Git 所采用)等优化变体将其降低至 O(n + d²),其中 d 为差异数量,在大多数行相同时速度极快。
代码示例
以下是 JavaScript、Python、Go 和命令行中逐行文本比较的实现示例,每个示例均生成统一 diff 风格的输出。
// Line-by-line diff using the LCS algorithm
function diffLines(a, b) {
const linesA = a.split('\n')
const linesB = b.split('\n')
// Build LCS table
const m = linesA.length, n = linesB.length
const dp = Array.from({ length: m + 1 }, () => new Array(n + 1).fill(0))
for (let i = 1; i <= m; i++)
for (let j = 1; j <= n; j++)
dp[i][j] = linesA[i-1] === linesB[j-1]
? dp[i-1][j-1] + 1
: Math.max(dp[i-1][j], dp[i][j-1])
// Backtrack to produce diff
const result = []
let i = m, j = n
while (i > 0 || j > 0) {
if (i > 0 && j > 0 && linesA[i-1] === linesB[j-1]) {
result.unshift({ type: 'equal', text: linesA[i-1] }); i--; j--
} else if (j > 0 && (i === 0 || dp[i][j-1] >= dp[i-1][j])) {
result.unshift({ type: 'add', text: linesB[j-1] }); j--
} else {
result.unshift({ type: 'remove', text: linesA[i-1] }); i--
}
}
return result
}
const diff = diffLines("alpha\nbeta\ngamma", "alpha\nbeta changed\ngamma\ndelta")
// → [
// { type: 'equal', text: 'alpha' },
// { type: 'remove', text: 'beta' },
// { type: 'add', text: 'beta changed' },
// { type: 'equal', text: 'gamma' },
// { type: 'add', text: 'delta' }
// ]import difflib
text_a = """alpha
beta
gamma""".splitlines()
text_b = """alpha
beta changed
gamma
delta""".splitlines()
# Unified diff (same format as git diff)
for line in difflib.unified_diff(text_a, text_b, fromfile='a.txt', tofile='b.txt', lineterm=''):
print(line)
# --- a.txt
# +++ b.txt
# @@ -1,3 +1,4 @@
# alpha
# -beta
# +beta changed
# gamma
# +delta
# HTML side-by-side diff
d = difflib.HtmlDiff()
html = d.make_file(text_a, text_b, fromdesc='Original', todesc='Modified')package main
import (
"fmt"
"strings"
)
// Minimal LCS-based line diff
func diffLines(a, b string) {
la := strings.Split(a, "\n")
lb := strings.Split(b, "\n")
m, n := len(la), len(lb)
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if la[i-1] == lb[j-1] {
dp[i][j] = dp[i-1][j-1] + 1
} else if dp[i-1][j] >= dp[i][j-1] {
dp[i][j] = dp[i-1][j]
} else {
dp[i][j] = dp[i][j-1]
}
}
}
var result []string
i, j := m, n
for i > 0 || j > 0 {
if i > 0 && j > 0 && la[i-1] == lb[j-1] {
result = append([]string{" " + la[i-1]}, result...)
i--; j--
} else if j > 0 && (i == 0 || dp[i][j-1] >= dp[i-1][j]) {
result = append([]string{"+" + lb[j-1]}, result...)
j--
} else {
result = append([]string{"-" + la[i-1]}, result...)
i--
}
}
for _, line := range result {
fmt.Println(line)
}
}
// Output:
// alpha
// -beta
// +beta changed
// gamma
// +delta# Compare two files with unified diff (3 lines of context) diff -u original.txt modified.txt # Git diff between working tree and last commit git diff HEAD -- file.txt # Git diff between two branches git diff main..feature -- src/ # Side-by-side diff in the terminal diff -y --width=120 original.txt modified.txt # Color-coded diff (requires colordiff) diff -u original.txt modified.txt | colordiff