正则表达式测试器
在线测试正则表达式,实时高亮显示所有匹配项
正则模式
测试字符串
什么是正则表达式?
正则表达式(regex 或 regexp)是一种用于定义搜索模式的字符序列。正则表达式测试器让你编写一个模式,将其应用于示例文本,并实时高亮显示所有匹配项。这一概念可追溯至20世纪50年代数学家 Stephen Kleene 对正则语言的研究,Ken Thompson 于1968年将第一个正则表达式引擎内置于 QED 文本编辑器中。
正则引擎从左到右读取模式,逐步消耗输入字符以尝试匹配。当部分匹配失败时,引擎会回溯并尝试模式中的其他路径。某些引擎(如 Go 使用的 RE2)通过将模式转换为确定性有限自动机(DFA)完全避免回溯,以牺牲反向引用等特性为代价,保证线性时间匹配。
正则表达式语法并没有统一标准。PCRE(Perl Compatible Regular Expressions)是最常见的方言,PHP、Python 的 re 模块以及 JavaScript 均有支持,但存在细微差异。POSIX 定义了一种更受限的语法,用于 grep 和 sed。在不同语言间移植模式时,这些差异至关重要:在 JavaScript 中有效的前瞻断言,在 Go 的 RE2 引擎中可能完全无法编译。
为什么使用在线正则表达式测试器?
在代码文件中编写正则表达式,每次调整模式都需要保存、运行并检查输出。基于浏览器的正则测试器将这一反馈循环缩短为零:输入即见结果。
正则表达式测试器的使用场景
正则表达式语法速查
下表涵盖最常用的正则表达式标记,适用于 JavaScript、Python、Go、PHP 及大多数兼容 PCRE 的引擎。特定语言的扩展(如 Python 的条件模式或 JavaScript 使用 \k 语法的命名分组)将在代码示例部分说明。
| 模式 | 名称 | 说明 |
|---|---|---|
| . | Any character | Matches any single character except newline (unless s flag is set) |
| \d | Digit | Matches [0-9] |
| \w | Word character | Matches [a-zA-Z0-9_] |
| \s | Whitespace | Matches space, tab, newline, carriage return, form feed |
| \b | Word boundary | Matches the position between a word character and a non-word character |
| ^ | Start of string/line | Matches the start of the input; with m flag, matches start of each line |
| $ | End of string/line | Matches the end of the input; with m flag, matches end of each line |
| * | Zero or more | Matches the preceding token 0 or more times (greedy) |
| + | One or more | Matches the preceding token 1 or more times (greedy) |
| ? | Optional | Matches the preceding token 0 or 1 time |
| {n,m} | Quantifier range | Matches the preceding token between n and m times |
| () | Capturing group | Groups tokens and captures the matched text for back-references |
| (?:) | Non-capturing group | Groups tokens without capturing the matched text |
| (?=) | Positive lookahead | Matches a position followed by the given pattern, without consuming it |
| (?<=) | Positive lookbehind | Matches a position preceded by the given pattern, without consuming it |
| [abc] | Character class | Matches any one of the characters inside the brackets |
| [^abc] | Negated class | Matches any character not inside the brackets |
| | | Alternation | Matches the expression before or after the pipe |
正则表达式标志详解
标志(也称修饰符)用于改变引擎处理模式的方式。在 JavaScript 中,标志附加在结束斜杠之后:/pattern/gi。在 Python 中,作为第二个参数传入:re.findall(pattern, text, re.IGNORECASE | re.MULTILINE)。并非所有标志在每种语言中都可用。
| 标志 | 名称 | 行为 |
|---|---|---|
| g | Global | Find all matches, not just the first one |
| i | Case-insensitive | Letters match both uppercase and lowercase |
| m | Multiline | ^ and $ match start/end of each line, not just the whole string |
| s | Dot-all | . matches newline characters as well |
| u | Unicode | Treat the pattern and subject as a Unicode string; enables \u{FFFF} syntax |
| y | Sticky | Matches only from the lastIndex position in the target string |
代码示例
以下是 JavaScript、Python、Go 和命令行中可运行的正则表达式示例,每个示例展示模式构建、匹配提取及输出结果。
// Match all email addresses in a string
const text = 'Contact us at support@example.com or sales@example.com'
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g
const matches = text.matchAll(emailRegex)
for (const match of matches) {
console.log(match[0], 'at index', match.index)
}
// → "support@example.com" at index 14
// → "sales@example.com" at index 37
// Named capture groups (ES2018+)
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = '2026-03-30'.match(dateRegex)
console.log(result.groups)
// → { year: "2026", month: "03", day: "30" }
// Replace with a callback
'hello world'.replace(/\b\w/g, c => c.toUpperCase())
// → "Hello World"import re
# Find all IPv4 addresses
text = 'Server 192.168.1.1 responded, fallback to 10.0.0.255'
pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
matches = re.findall(pattern, text)
print(matches) # → ['192.168.1.1', '10.0.0.255']
# Named groups and match objects
date_pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.search(date_pattern, 'Released on 2026-03-30')
if m:
print(m.group('year')) # → '2026'
print(m.group('month')) # → '03'
# Compile for repeated use (faster in loops)
compiled = re.compile(r'\b[A-Z][a-z]+\b')
words = compiled.findall('Hello World Foo bar')
print(words) # → ['Hello', 'World', 'Foo']package main
import (
"fmt"
"regexp"
)
func main() {
// Find all matches
re := regexp.MustCompile(`\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b`)
text := "Contact support@example.com or sales@example.com"
matches := re.FindAllString(text, -1)
fmt.Println(matches)
// → [support@example.com sales@example.com]
// Named capture groups
dateRe := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
match := dateRe.FindStringSubmatch("2026-03-30")
for i, name := range dateRe.SubexpNames() {
if name != "" {
fmt.Printf("%s: %s\n", name, match[i])
}
}
// → year: 2026
// → month: 03
// → day: 30
// Replace with a function
result := re.ReplaceAllStringFunc(text, func(s string) string {
return "[REDACTED]"
})
fmt.Println(result)
// → Contact [REDACTED] or [REDACTED]
}# Find lines matching an IP address pattern
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log
# Extract email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt
# Replace dates from YYYY-MM-DD to DD/MM/YYYY using sed
echo "2026-03-30" | sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/'
# → 30/03/2026
# Count matches per file in a directory
grep -rcE 'TODO|FIXME|HACK' src/
# → src/main.js:3
# → src/utils.js:1