什么是正则表达式？

正则表达式（regex 或 regexp）是一种用于定义搜索模式的字符序列。正则表达式测试器让你编写一个模式，将其应用于示例文本，并实时高亮显示所有匹配项。这一概念可追溯至20世纪50年代数学家 Stephen Kleene 对正则语言的研究，Ken Thompson 于1968年将第一个正则表达式引擎内置于 QED 文本编辑器中。

正则引擎从左到右读取模式，逐步消耗输入字符以尝试匹配。当部分匹配失败时，引擎会回溯并尝试模式中的其他路径。某些引擎（如 Go 使用的 RE2）通过将模式转换为确定性有限自动机（DFA）完全避免回溯，以牺牲反向引用等特性为代价，保证线性时间匹配。

正则表达式语法并没有统一标准。PCRE（Perl Compatible Regular Expressions）是最常见的方言，PHP、Python 的 re 模块以及 JavaScript 均有支持，但存在细微差异。POSIX 定义了一种更受限的语法，用于 grep 和 sed。在不同语言间移植模式时，这些差异至关重要：在 JavaScript 中有效的前瞻断言，在 Go 的 RE2 引擎中可能完全无法编译。

为什么使用在线正则表达式测试器？

在代码文件中编写正则表达式，每次调整模式都需要保存、运行并检查输出。基于浏览器的正则测试器将这一反馈循环缩短为零：输入即见结果。

⚡

实时匹配高亮

每次按键都会即时更新匹配结果。你可以看到文本的哪些部分匹配、哪些捕获组被填充，以及每个匹配的起止位置，无需编译-运行-调试的循环。

🔒

本地处理，保护隐私

模式匹配完全在浏览器中使用 JavaScript RegExp 引擎执行，文本和模式均不会发送至服务器。在测试包含敏感信息的日志文件、用户数据或 API 响应时，这一点尤为重要。

🔍

可视化匹配检查

匹配项以内联方式高亮显示，并展示其位置和捕获组的值。直观地查看匹配项，有助于快速发现量词中的差一错误或缺失的锚点。

🌐

无需登录或安装

在任何现代浏览器的设备上均可使用，无需账号、扩展程序或 IDE 插件。打开页面，粘贴模式和文本，即可开始测试。

正则表达式测试器的使用场景

前端输入验证

在将模式嵌入 HTML5 pattern 属性或 JavaScript 验证逻辑之前，构建并验证电子邮件、电话号码或信用卡字段的正则表达式。

后端日志解析

编写从应用日志中提取时间戳、错误码或 IP 地址的正则表达式，并使用真实日志样本测试，确认模式能够捕获正确的分组。

DevOps 与基础设施

调试 Nginx location 块、Apache 重写规则或 Prometheus 告警规则中使用的正则表达式。服务器配置中的错误模式可能导致路由中断或告警完全失效。

QA 与测试自动化

验证响应体或 HTML 输出是否符合端到端测试断言中预期的模式。在将正则表达式提交到测试套件之前，先在此处进行验证。

数据提取流水线

为从非结构化文本中提取结构化字段设计原型模式，例如抓取商品价格、解析 CSV 边界情况，或从邮件头中提取元数据。

学习正则表达式

在示例字符串上实验元字符、量词和分组。即时的视觉反馈让正则表达式语法比单纯阅读文档更易于掌握。

正则表达式语法速查

下表涵盖最常用的正则表达式标记，适用于 JavaScript、Python、Go、PHP 及大多数兼容 PCRE 的引擎。特定语言的扩展（如 Python 的条件模式或 JavaScript 使用 \k 语法的命名分组）将在代码示例部分说明。

模式	名称	说明
.	Any character	Matches any single character except newline (unless s flag is set)
\d	Digit	Matches [0-9]
\w	Word character	Matches [a-zA-Z0-9_]
\s	Whitespace	Matches space, tab, newline, carriage return, form feed
\b	Word boundary	Matches the position between a word character and a non-word character
^	Start of string/line	Matches the start of the input; with m flag, matches start of each line
$	End of string/line	Matches the end of the input; with m flag, matches end of each line
*	Zero or more	Matches the preceding token 0 or more times (greedy)
+	One or more	Matches the preceding token 1 or more times (greedy)
?	Optional	Matches the preceding token 0 or 1 time
{n,m}	Quantifier range	Matches the preceding token between n and m times
()	Capturing group	Groups tokens and captures the matched text for back-references
(?:)	Non-capturing group	Groups tokens without capturing the matched text
(?=)	Positive lookahead	Matches a position followed by the given pattern, without consuming it
(?<=)	Positive lookbehind	Matches a position preceded by the given pattern, without consuming it
[abc]	Character class	Matches any one of the characters inside the brackets
[^abc]	Negated class	Matches any character not inside the brackets
\|	Alternation	Matches the expression before or after the pipe

正则表达式标志详解

标志（也称修饰符）用于改变引擎处理模式的方式。在 JavaScript 中，标志附加在结束斜杠之后：/pattern/gi。在 Python 中，作为第二个参数传入：re.findall(pattern, text, re.IGNORECASE | re.MULTILINE)。并非所有标志在每种语言中都可用。

标志	名称	行为
g	Global	Find all matches, not just the first one
i	Case-insensitive	Letters match both uppercase and lowercase
m	Multiline	^ and $ match start/end of each line, not just the whole string
s	Dot-all	. matches newline characters as well
u	Unicode	Treat the pattern and subject as a Unicode string; enables \u{FFFF} syntax
y	Sticky	Matches only from the lastIndex position in the target string

代码示例

以下是 JavaScript、Python、Go 和命令行中可运行的正则表达式示例，每个示例展示模式构建、匹配提取及输出结果。

JavaScript

// Match all email addresses in a string
const text = 'Contact us at support@example.com or sales@example.com'
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g

const matches = text.matchAll(emailRegex)
for (const match of matches) {
  console.log(match[0], 'at index', match.index)
}
// → "support@example.com" at index 14
// → "sales@example.com" at index 37

// Named capture groups (ES2018+)
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = '2026-03-30'.match(dateRegex)
console.log(result.groups)
// → { year: "2026", month: "03", day: "30" }

// Replace with a callback
'hello world'.replace(/\b\w/g, c => c.toUpperCase())
// → "Hello World"

Python

import re

# Find all IPv4 addresses
text = 'Server 192.168.1.1 responded, fallback to 10.0.0.255'
pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'

matches = re.findall(pattern, text)
print(matches)  # → ['192.168.1.1', '10.0.0.255']

# Named groups and match objects
date_pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.search(date_pattern, 'Released on 2026-03-30')
if m:
    print(m.group('year'))   # → '2026'
    print(m.group('month'))  # → '03'

# Compile for repeated use (faster in loops)
compiled = re.compile(r'\b[A-Z][a-z]+\b')
words = compiled.findall('Hello World Foo bar')
print(words)  # → ['Hello', 'World', 'Foo']

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// Find all matches
	re := regexp.MustCompile(`\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b`)
	text := "Contact support@example.com or sales@example.com"
	matches := re.FindAllString(text, -1)
	fmt.Println(matches)
	// → [support@example.com sales@example.com]

	// Named capture groups
	dateRe := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
	match := dateRe.FindStringSubmatch("2026-03-30")
	for i, name := range dateRe.SubexpNames() {
		if name != "" {
			fmt.Printf("%s: %s\n", name, match[i])
		}
	}
	// → year: 2026
	// → month: 03
	// → day: 30

	// Replace with a function
	result := re.ReplaceAllStringFunc(text, func(s string) string {
		return "[REDACTED]"
	})
	fmt.Println(result)
	// → Contact [REDACTED] or [REDACTED]
}

CLI (grep / sed)

# Find lines matching an IP address pattern
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log

# Extract email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

# Replace dates from YYYY-MM-DD to DD/MM/YYYY using sed
echo "2026-03-30" | sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/'
# → 30/03/2026

# Count matches per file in a directory
grep -rcE 'TODO|FIXME|HACK' src/
# → src/main.js:3
# → src/utils.js:1

常见问题

正则表达式与通配符模式有什么区别？

通配符模式（如 *.txt 或 src/**/*.js）是一种简化的通配符语法，用于在 shell 和构建工具中匹配文件路径。它支持 *（任意字符）、?（单个字符）和 []（字符类），但不支持量词、分组、前瞻断言和分支。正则表达式的表达能力远更强大，可用于任意文本，而不仅限于文件路径。通配符模式 *.json 大致等价于正则表达式 ^.*\.json$。

如何在正则表达式中匹配字面点号或方括号？

在字符前加反斜杠：\. 匹配字面句点，\[ 匹配字面方括号。在字符类 [] 内部，点号本身就是字面字符，无需转义。一个常见错误是写 192.168.1.1 而不转义点号，这会匹配 192x168y1z1，因为 . 表示“任意字符”。

我的测试数据会被发送到服务器吗？

不会。本工具完全在浏览器中使用 JavaScript RegExp 引擎执行正则匹配，不会向任何服务器发送你的文本或模式。你可以在使用工具时打开浏览器开发者工具的网络面板进行确认。

为什么我的正则表达式在 JavaScript 中有效，但在 Python 中失败？

JavaScript 和 Python 使用不同的正则引擎，特性集略有差异。JavaScript 支持 \d、前瞻断言 (?=) 和 ES2018 起支持的后顾断言 (?<=)，但不支持条件模式、原子组或独占量词。Python 的 re 模块不支持 \p{} Unicode 属性类（请使用第三方 regex 模块）。始终在目标语言的引擎中测试，或查阅该语言的正则表达式文档。

什么是灾难性回溯？

当模式中存在嵌套量词，产生指数级匹配路径时，就会发生灾难性回溯。经典示例是将 (a+)+ 应用于一串 a 后跟不匹配字符的字符串，引擎会尝试所有可能的方式在内外分组间分配这些 a，然后才宣告失败。解决方法是使用原子组 (?>)、独占量词 a++，或重写模式以避免歧义性重复。

可以用正则表达式解析 HTML 吗？

正则表达式可以从 HTML 片段中提取简单的值，例如从单个 <a> 标签中提取 href 属性。对于完整的 HTML 解析，应使用专用解析器（JavaScript 中的 DOMParser、Python 中的 BeautifulSoup，或 Go 中的 html/template）。HTML 是上下文无关文法，而正则表达式处理的是正则文法。嵌套标签、可选属性和自闭合元素会产生正则表达式无法可靠匹配的模式。

贪婪量词和懒惰量词有什么区别？

贪婪量词（* 或 +）尽可能多地匹配字符，若后续模式失败则回溯。懒惰量词（*? 或 +?）尽可能少地匹配字符，仅在必要时才扩展。对于输入 一二，贪婪模式 .* 匹配从第一个 到最后一个 的整个字符串，而懒惰模式 .*? 则分别匹配 一 和 二。

Regex Tester