Regular Expression คืออะไร?

Regular expression (regex หรือ regexp) คือลำดับอักขระที่กำหนดรูปแบบการค้นหา ตัวทดสอบ regex ช่วยให้คุณเขียนรูปแบบ รันกับข้อความตัวอย่าง และดูผลการจับคู่ที่เน้นสีแบบเรียลไทม์ แนวคิดนี้มีต้นกำเนิดจากงานของนักคณิตศาสตร์ Stephen Kleene เกี่ยวกับ regular languages ในช่วงทศวรรษ 1950 และ Ken Thompson ได้สร้าง regex engine ตัวแรกไว้ในโปรแกรมแก้ไขข้อความ QED ในปี 1968

Regex engine อ่านรูปแบบจากซ้ายไปขวา โดยใช้อักขระจากข้อความที่รับเข้ามาขณะพยายามจับคู่ เมื่อการจับคู่บางส่วนล้มเหลว engine จะย้อนกลับและลองเส้นทางอื่นในรูปแบบ บาง engine (เช่น RE2 ที่ใช้ใน Go) หลีกเลี่ยงการย้อนกลับโดยสิ้นเชิงด้วยการแปลงรูปแบบเป็น deterministic finite automata (DFA) ซึ่งรับประกันการจับคู่ในเวลาเชิงเส้น แต่ไม่รองรับคุณสมบัติอย่าง back-references

ไวยากรณ์ regex ถูกกำหนดมาตรฐานอย่างหลวมๆ PCRE (Perl Compatible Regular Expressions) เป็นรูปแบบที่พบบ่อยที่สุด รองรับโดย PHP, โมดูล re ของ Python และ JavaScript โดยมีความแตกต่างเล็กน้อย POSIX กำหนดไวยากรณ์ที่จำกัดกว่าซึ่งใช้โดย grep และ sed ความแตกต่างเหล่านี้สำคัญเมื่อนำรูปแบบไปใช้ระหว่างภาษาต่างๆ: lookahead ที่ทำงานได้ใน JavaScript อาจคอมไพล์ไม่ได้เลยใน RE2 engine ของ Go

ทำไมต้องใช้ตัวทดสอบ Regex ออนไลน์?

การเขียน regex ในไฟล์โค้ดหมายถึงต้องบันทึก รัน และตรวจสอบผลลัพธ์ทุกครั้งที่ปรับรูปแบบ ตัวทดสอบ regex บนเบราว์เซอร์ขจัดวงจรป้อนกลับนั้นเป็นศูนย์: พิมพ์แล้วเห็นผลทันที

⚡

เน้นสีผลการจับคู่แบบเรียลไทม์

ทุกการกดแป้นพิมพ์อัปเดตผลการจับคู่ทันที คุณเห็นว่าส่วนใดของข้อความตรงกัน กลุ่มจับภาพใดถูกเติมข้อมูล และตำแหน่งที่แต่ละคู่เริ่มและสิ้นสุด ไม่มีวงจร compile-run-debug

🔒

ประมวลผลโดยคำนึงถึงความเป็นส่วนตัว

การจับคู่รูปแบบทำงานในเบราว์เซอร์ของคุณโดยใช้ JavaScript RegExp engine ไม่มีข้อความหรือรูปแบบใดถูกส่งไปยังเซิร์ฟเวอร์ สิ่งนี้สำคัญเมื่อทดสอบกับ log file, ข้อมูลลูกค้า หรือการตอบกลับ API ที่มีข้อมูลละเอียดอ่อน

🔍

ตรวจสอบผลการจับคู่แบบภาพ

ผลการจับคู่ถูกเน้นสีแบบ inline พร้อมแสดงตำแหน่งและค่าของกลุ่มจับภาพ การมองเห็นผลการจับคู่แบบภาพทำให้ง่ายต่อการตรวจจับข้อผิดพลาด off-by-one ใน quantifiers หรือ anchor ที่หายไป

🌐

ไม่ต้องเข้าสู่ระบบหรือติดตั้ง

ใช้งานได้บนอุปกรณ์ใดก็ได้ที่มีเบราว์เซอร์สมัยใหม่ ไม่ต้องสร้างบัญชี ไม่ต้องติดตั้ง extension ไม่ต้องใช้ IDE plugin เปิดหน้า วางรูปแบบและข้อความ แล้วเริ่มทดสอบได้เลย

กรณีการใช้งานตัวทดสอบ Regex

การตรวจสอบข้อมูลฝั่ง Frontend

สร้างและตรวจสอบรูปแบบสำหรับช่องอีเมล เบอร์โทรศัพท์ หรือบัตรเครดิต ก่อนฝังในแอตทริบิวต์ pattern ของ HTML5 หรือ logic การตรวจสอบของ JavaScript

การวิเคราะห์ Log ฝั่ง Backend

เขียนรูปแบบ regex ที่ดึง timestamp, รหัสข้อผิดพลาด หรือ IP address จาก application log ทดสอบกับตัวอย่าง log จริงเพื่อยืนยันว่ารูปแบบจับภาพกลุ่มที่ถูกต้อง

DevOps และโครงสร้างพื้นฐาน

แก้ไขข้อบกพร่องของ regex ที่ใช้ใน Nginx location block, Apache rewrite rule หรือ Prometheus alerting rule รูปแบบที่ผิดพลาดในการตั้งค่าเซิร์ฟเวอร์อาจทำให้การ routing เสียหายหรือพลาด alert ได้

QA และการทดสอบอัตโนมัติ

ตรวจสอบว่า response body หรือผลลัพธ์ HTML ตรงกับรูปแบบที่คาดหวังใน end-to-end test assertion ทดสอบ regex ที่นี่ก่อนนำไปใส่ใน test suite

Pipeline การดึงข้อมูล

สร้างต้นแบบรูปแบบสำหรับดึงฟิลด์ที่มีโครงสร้างจากข้อความที่ไม่มีโครงสร้าง: ดึงราคาสินค้า วิเคราะห์ CSV ที่มีกรณีพิเศษ หรือดึง metadata จาก email header

การเรียนรู้ Regular Expressions

ทดลองใช้ metacharacter, quantifier และกลุ่มกับข้อความตัวอย่าง ผลป้อนกลับแบบภาพทันทีทำให้เรียนรู้ไวยากรณ์ regex ได้ง่ายกว่าการอ่านเอกสารเพียงอย่างเดียว

ตารางอ้างอิงไวยากรณ์ Regex

ตารางด้านล่างครอบคลุม token ของ regex ที่ใช้บ่อยที่สุด ใช้ได้กับ JavaScript, Python, Go, PHP และ engine ที่เข้ากันได้กับ PCRE ส่วนขยายเฉพาะภาษา (เช่น conditional pattern ของ Python หรือ named group ของ JavaScript ด้วยไวยากรณ์ \k) จะอธิบายในส่วนตัวอย่างโค้ด

รูปแบบ	ชื่อ	คำอธิบาย
.	Any character	Matches any single character except newline (unless s flag is set)
\d	Digit	Matches [0-9]
\w	Word character	Matches [a-zA-Z0-9_]
\s	Whitespace	Matches space, tab, newline, carriage return, form feed
\b	Word boundary	Matches the position between a word character and a non-word character
^	Start of string/line	Matches the start of the input; with m flag, matches start of each line
$	End of string/line	Matches the end of the input; with m flag, matches end of each line
*	Zero or more	Matches the preceding token 0 or more times (greedy)
+	One or more	Matches the preceding token 1 or more times (greedy)
?	Optional	Matches the preceding token 0 or 1 time
{n,m}	Quantifier range	Matches the preceding token between n and m times
()	Capturing group	Groups tokens and captures the matched text for back-references
(?:)	Non-capturing group	Groups tokens without capturing the matched text
(?=)	Positive lookahead	Matches a position followed by the given pattern, without consuming it
(?<=)	Positive lookbehind	Matches a position preceded by the given pattern, without consuming it
[abc]	Character class	Matches any one of the characters inside the brackets
[^abc]	Negated class	Matches any character not inside the brackets
\|	Alternation	Matches the expression before or after the pipe

อธิบาย Flag ของ Regex

Flag (หรือที่เรียกว่า modifier) เปลี่ยนวิธีที่ engine ประมวลผลรูปแบบ ใน JavaScript คุณต่อท้าย flag หลัง slash ปิด: /pattern/gi ใน Python คุณส่ง flag เป็นอาร์กิวเมนต์ที่สอง: re.findall(pattern, text, re.IGNORECASE | re.MULTILINE) ไม่ใช่ทุก flag ที่มีในทุกภาษา

Flag	ชื่อ	พฤติกรรม
g	Global	Find all matches, not just the first one
i	Case-insensitive	Letters match both uppercase and lowercase
m	Multiline	^ and $ match start/end of each line, not just the whole string
s	Dot-all	. matches newline characters as well
u	Unicode	Treat the pattern and subject as a Unicode string; enables \u{FFFF} syntax
y	Sticky	Matches only from the lastIndex position in the target string

ตัวอย่างโค้ด

ตัวอย่าง regex ที่ใช้งานได้จริงใน JavaScript, Python, Go และ command line แต่ละตัวอย่างแสดงการสร้างรูปแบบ การดึงผลการจับคู่ และผลลัพธ์

JavaScript

// Match all email addresses in a string
const text = 'Contact us at support@example.com or sales@example.com'
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g

const matches = text.matchAll(emailRegex)
for (const match of matches) {
  console.log(match[0], 'at index', match.index)
}
// → "support@example.com" at index 14
// → "sales@example.com" at index 37

// Named capture groups (ES2018+)
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = '2026-03-30'.match(dateRegex)
console.log(result.groups)
// → { year: "2026", month: "03", day: "30" }

// Replace with a callback
'hello world'.replace(/\b\w/g, c => c.toUpperCase())
// → "Hello World"

Python

import re

# Find all IPv4 addresses
text = 'Server 192.168.1.1 responded, fallback to 10.0.0.255'
pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'

matches = re.findall(pattern, text)
print(matches)  # → ['192.168.1.1', '10.0.0.255']

# Named groups and match objects
date_pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.search(date_pattern, 'Released on 2026-03-30')
if m:
    print(m.group('year'))   # → '2026'
    print(m.group('month'))  # → '03'

# Compile for repeated use (faster in loops)
compiled = re.compile(r'\b[A-Z][a-z]+\b')
words = compiled.findall('Hello World Foo bar')
print(words)  # → ['Hello', 'World', 'Foo']

package main

import (
	"fmt"
	"regexp"
)

func main() {
	// Find all matches
	re := regexp.MustCompile(`\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b`)
	text := "Contact support@example.com or sales@example.com"
	matches := re.FindAllString(text, -1)
	fmt.Println(matches)
	// → [support@example.com sales@example.com]

	// Named capture groups
	dateRe := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
	match := dateRe.FindStringSubmatch("2026-03-30")
	for i, name := range dateRe.SubexpNames() {
		if name != "" {
			fmt.Printf("%s: %s\n", name, match[i])
		}
	}
	// → year: 2026
	// → month: 03
	// → day: 30

	// Replace with a function
	result := re.ReplaceAllStringFunc(text, func(s string) string {
		return "[REDACTED]"
	})
	fmt.Println(result)
	// → Contact [REDACTED] or [REDACTED]
}

CLI (grep / sed)

# Find lines matching an IP address pattern
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log

# Extract email addresses from a file
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

# Replace dates from YYYY-MM-DD to DD/MM/YYYY using sed
echo "2026-03-30" | sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/'
# → 30/03/2026

# Count matches per file in a directory
grep -rcE 'TODO|FIXME|HACK' src/
# → src/main.js:3
# → src/utils.js:1

คำถามที่พบบ่อย

ความแตกต่างระหว่าง regex กับ glob pattern คืออะไร?

Glob pattern (เช่น *.txt หรือ src/**/*.js) เป็นไวยากรณ์ wildcard แบบง่ายที่ใช้สำหรับจับคู่ path ของไฟล์ใน shell และ build tool รองรับ * (อักขระใดก็ได้), ? (หนึ่งอักขระ) และ [] (character class) แต่ขาด quantifier, กลุ่ม, lookahead และ alternation Regex มีความสามารถมากกว่ามากและทำงานกับข้อความทั่วไป ไม่ใช่แค่ path ของไฟล์ Glob pattern *.json มีความหมายใกล้เคียงกับ regex ^.*\.json$

จะจับคู่จุดหรือวงเล็บแบบ literal ใน regex อย่างไร?

ใส่ backslash นำหน้าอักขระ: \. จับคู่จุดตัวอักษรจริง, \[ จับคู่วงเล็บเหลี่ยมตัวอักษรจริง ภายใน character class [] จุดถือเป็น literal อยู่แล้วและไม่จำเป็นต้องใช้ escape ข้อผิดพลาดที่พบบ่อยคือเขียน 192.168.1.1 โดยไม่ escape จุด ซึ่งจะจับคู่กับ 192x168y1z1 ด้วยเพราะ . หมายถึง "อักขระใดก็ได้"

ข้อมูลทดสอบของฉันถูกส่งไปยังเซิร์ฟเวอร์หรือไม่?

ไม่ใช่ เครื่องมือนี้รันการจับคู่ regex ทั้งหมดในเบราว์เซอร์ของคุณโดยใช้ JavaScript RegExp engine ไม่มีการส่งคำขอเครือข่ายด้วยข้อความหรือรูปแบบของคุณ คุณสามารถยืนยันได้โดยเปิดแท็บ Network ใน DevTools ของเบราว์เซอร์ขณะใช้เครื่องมือ

ทำไม regex ของฉันทำงานได้ใน JavaScript แต่ล้มเหลวใน Python?

JavaScript และ Python ใช้ regex engine ที่แตกต่างกันโดยมีชุดคุณสมบัติที่แตกต่างกันเล็กน้อย JavaScript รองรับ \d, lookahead (?=) และ lookbehind (?<=) ตั้งแต่ ES2018 แต่ไม่รองรับ conditional pattern, atomic group หรือ possessive quantifier โมดูล re ของ Python ไม่รองรับ Unicode property class \p{\}' (ใช้โมดูล regex จาก third-party แทน) ควรทดสอบใน engine ของภาษาเป้าหมายหรือดูเอกสาร regex ของภาษานั้นเสมอ

อะไรทำให้เกิด catastrophic backtracking ใน regex?

Catastrophic backtracking เกิดขึ้นเมื่อรูปแบบมี quantifier ซ้อนกันที่สร้างเส้นทางการจับคู่จำนวนมากแบบเอกซ์โพเนนเชียล ตัวอย่างคลาสสิกคือ (a+)+ ที่ใช้กับสตริงของตัว a ตามด้วยอักขระที่ไม่ตรง engine ลองทุกวิธีที่เป็นไปได้ในการแบ่ง a ระหว่างกลุ่มใน/นอกก่อนล้มเหลว แก้ไขโดยใช้ atomic group (?>), possessive quantifier a++ หรือเขียนรูปแบบใหม่เพื่อหลีกเลี่ยงการซ้ำที่คลุมเครือ

ใช้ regex วิเคราะห์ HTML ได้หรือไม่?

Regex สามารถดึงค่าง่ายๆ จาก HTML fragment ได้ เช่น ดึง href จาก tag <a> เดียว สำหรับการวิเคราะห์ HTML แบบเต็ม ให้ใช้ parser ที่เหมาะสม (DOMParser ใน JavaScript, BeautifulSoup ใน Python หรือ html/template ใน Go) HTML เป็น context-free grammar และ regex จัดการ regular grammar ได้ tag ซ้อนกัน แอตทริบิวต์ที่เป็นทางเลือก และ element ที่ปิดตัวเองสร้างรูปแบบที่ regex ไม่สามารถจับคู่ได้อย่างน่าเชื่อถือ

ความแตกต่างระหว่าง greedy กับ lazy quantifier คืออะไร?

Greedy quantifier (* หรือ +) จับคู่อักขระให้ได้มากที่สุด แล้วย้อนกลับหากส่วนที่เหลือของรูปแบบล้มเหลว Lazy quantifier (*? หรือ +?) จับคู่อักขระให้ได้น้อยที่สุด และขยายเฉพาะเมื่อจำเป็น สำหรับข้อมูลนำเข้า onetwo, รูปแบบ greedy .* จับคู่ทั้งสตริงตั้งแต่ แรกถึง สุดท้าย ในขณะที่รูปแบบ lazy .*? จับคู่ one และ two แยกกัน

Regex Tester