Regular Expressions Mastery
Master pattern matching with regex. Learn syntax, metacharacters, groups, and practical applications for text processing.
Table of Contents
What Are Regular Expressions?
Regular expressions (regex) are powerful patterns used to match, search, and manipulate text. They provide a concise way to describe complex string patterns.
Regex is supported in virtually every programming language including JavaScript, Python, Java, PHP, and more. Learning regex is an essential skill for any developer.
A single regex pattern can replace dozens of lines of string manipulation code. It's essential for data validation, log parsing, search and replace, and text extraction.
Basic Syntax
A regex pattern is typically written between forward slashes:
/pattern/flags
Examples:
/hello/ Matches "hello" in any string
/hello/i Case-insensitive match
/hello/g Global match (find all occurrences)
Literal Matching
The simplest regex matches exact text:
/cat/ matches "cat" in "The cat sat on the mat"
/123/ matches "123" in "Order #12345"
/hello/ matches "hello" in "Say hello world"
Metacharacters
Special characters that have meaning in regex patterns:
. (Dot) - Any Single Character
/c.t/ matches "cat", "cot", "cut", "c9t", "c t"
/a.b/ matches "aab", "a1b", "a-b", "a b"
^ (Caret) - Start of String
/^Hello/ matches "Hello world"
does NOT match "Say Hello"
$ (Dollar) - End of String
/world$/ matches "Hello world"
does NOT match "world peace"
Character Classes [ ]
Match any single character from a set:
/[aeiou]/ matches any vowel
/[0-9]/ matches any digit (0 through 9)
/[A-Z]/ matches any uppercase letter
/[a-zA-Z]/ matches any letter
/[a-zA-Z0-9]/ matches any alphanumeric character
Negated Character Classes [^ ]
/[^0-9]/ matches any character that is NOT a digit
/[^aeiou]/ matches any character that is NOT a vowel
/[^a-z]/ matches any character that is NOT lowercase
Predefined Character Classes
\d matches any digit [0-9]
\D matches any non-digit [^0-9]
\w matches any word character [a-zA-Z0-9_]
\W matches any non-word character
\s matches any whitespace (space, tab, newline)
\S matches any non-whitespace character
/\d{3}-\d{3}-\d{4}/
Matches: "555-123-4567"
Explanation: 3 digits, hyphen, 3 digits, hyphen, 4 digits
Quantifiers
Specify how many times a pattern element should repeat:
Basic Quantifiers
* Zero or more /ab*c/ matches "ac", "abc", "abbc", "abbbc"
+ One or more /ab+c/ matches "abc", "abbc" (NOT "ac")
? Zero or one /colou?r/ matches "color" and "colour"
{n} Exactly n /a{3}/ matches "aaa"
{n,} n or more /a{2,}/ matches "aa", "aaa", "aaaa"...
{n,m} Between n and m /a{2,4}/ matches "aa", "aaa", "aaaa"
Greedy vs Lazy Matching
By default, quantifiers are greedy - they match as much as possible:
Pattern: /".+"/
Text: "Hello" and "World"
Result: Matches entire '"Hello" and "World"' (greedy)
Add ? after quantifier to make it lazy - match as little as possible:
Pattern: /".+?"/
Text: "Hello" and "World"
Result: Matches '"Hello"' and '"World"' separately (lazy)
Avoid nested quantifiers like (a+)+ or (.*)*. They can cause exponential processing time and freeze your application with certain inputs.
Groups & Capturing
Capturing Groups ( )
Parentheses create groups that capture matched text for later use:
Pattern: /(\d{3})-(\d{3})-(\d{4})/
Input: "555-123-4567"
Captures:
Group 0 (full match): "555-123-4567"
Group 1: "555" (area code)
Group 2: "123" (prefix)
Group 3: "4567" (line number)
Non-Capturing Groups (?: )
Group patterns without capturing (better performance):
/(?:Mr|Mrs|Ms)\.\s+(\w+)/
Matches: "Mr. Smith", "Mrs. Jones", "Ms. Davis"
Only captures the name, not the title
Alternation ( | )
Match one pattern OR another:
/cat|dog/ matches "cat" or "dog"
/gr(a|e)y/ matches "gray" or "grey"
/(red|blue|green) car/ matches color + " car"
Backreferences
Reference previously captured groups:
/(['"]).*?\1/ matches quoted strings (same quote type)
\1 refers to first captured group
Matches: "hello" or 'world'
No match: "hello' (mismatched quotes)
Lookahead & Lookbehind
Assert what comes before or after without including in match:
(?=...) Positive lookahead - must be followed by
(?!...) Negative lookahead - must NOT be followed by
(?<=...) Positive lookbehind - must be preceded by
(?<!...) Negative lookbehind - must NOT be preceded by
Examples:
/\d+(?=\s*dollars)/ matches "100" in "100 dollars"
/\d+(?!\s*cents)/ matches numbers NOT followed by "cents"
/(?<=\$)\d+/ matches "50" in "$50"
/(?<!\$)\d+/ matches numbers NOT preceded by "$"
Practical Examples
Email Validation
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
Breakdown:
^ Start of string
[a-zA-Z0-9._%+-]+ Username (letters, numbers, special chars)
@ Literal @ symbol
[a-zA-Z0-9.-]+ Domain name
\. Literal dot (escaped)
[a-zA-Z]{2,} TLD (minimum 2 letters)
$ End of string
Matches: [email protected], [email protected]
URL Validation
/^(https?:\/\/)?([\w.-]+)\.([a-z]{2,})(\/\S*)?$/i
Breakdown:
(https?:\/\/)? Optional http:// or https://
([\w.-]+) Domain name
\.([a-z]{2,}) TLD (.com, .org, etc.)
(\/\S*)? Optional path
Matches: example.com, https://www.site.org/page
Password Strength
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/
Requirements (using lookaheads):
(?=.*[a-z]) At least one lowercase
(?=.*[A-Z]) At least one uppercase
(?=.*\d) At least one digit
(?=.*[@$!%*?&]) At least one special character
{8,} Minimum 8 characters total
Credit Card Number
/^\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}$/
Matches:
1234567890123456
1234 5678 9012 3456
1234-5678-9012-3456
Date Extraction (MM/DD/YYYY)
/\b(0?[1-9]|1[0-2])\/(0?[1-9]|[12]\d|3[01])\/(\d{4})\b/
Matches: 1/15/2026, 01/15/2026, 12/31/2025
Groups capture: month, day, year separately
IP Address
/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/
Matches: 192.168.1.1, 10.0.0.255
Note: Does not validate range (0-255)
Task: Swap first and last names
Find: /(\w+)\s+(\w+)/
Replace: $2, $1
Input: "John Doe"
Output: "Doe, John"
Common Flags
g Global - find all matches, not just the first
i Case-insensitive - ignore uppercase/lowercase
m Multiline - ^ and $ match line boundaries
s Dotall - dot (.) matches newlines too
u Unicode - enable full Unicode support
y Sticky - match only from lastIndex position
Usage in Different Languages
// JavaScript
const regex = /pattern/gi;
const matches = text.match(regex);
const result = text.replace(regex, 'replacement');
# Python
import re
match = re.search(r'pattern', text)
all_matches = re.findall(r'pattern', text)
result = re.sub(r'pattern', 'replacement', text)
// Java
Pattern p = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) { /* process match */ }
# Ruby
text.scan(/pattern/i) { |match| puts match }
text.gsub(/pattern/, 'replacement')
Best Practices
- Start simple: Build complex patterns incrementally, testing each step
- Test thoroughly: Use online tools like regex101.com to debug
- Comment patterns: Complex regex should be documented
- Escape metacharacters: Use
\.\*\?\+for literals - Use raw strings: Python
r'...'avoids double-escaping - Be specific: Prefer
\d+over.+when possible - Avoid backtracking: Test patterns with long input strings
regex101.com - Best online tester with detailed explanations
regexr.com - Visual regex builder and reference
debuggex.com - Visual regex debugger with railroad diagrams
When NOT to Use Regex
- Parsing HTML/XML: Use proper parsers like BeautifulSoup or lxml
- Complex nested structures: JSON, code - use dedicated parsers
- Simple string checks:
startsWith(),includes()are faster - Email validation: Consider using built-in validators for production
Regex is incredibly powerful but has limitations. For complex data structures, use dedicated parsers. For simple operations, built-in string methods are often faster and more readable.