Skip to content
Kordu Tools

Regex Cheat Sheet: A Complete Guide to Regular Expressions

Kordu Team · 2026-03-31

Key Takeaways

  • Character classes (\d, \w, \s) and quantifiers (+, *, ?, {n}) cover 80% of real-world regex use cases.
  • Lookaheads, lookbehinds, and named groups handle the other 20% -- but you will not need them daily.
  • Regex works in every major language, text editor, and CLI tool. Learn it once, use it everywhere.
  • Always test against real data before deploying. Edge cases in regex are relentless.

Test Your Patterns

Try any pattern from this guide against your own text. Matches and capture groups highlight in real time.

Try it Regex Tester

 

Test text

44 chars

Preview

The quick brown fox jumps over 13 lazy dogs.

Matches

9 found

Match 1

0 to 3

The

Match 2

4 to 9

quick

Match 3

10 to 15

brown

Match 4

16 to 19

fox

Match 5

20 to 25

jumps

Match 6

26 to 30

over

Match 7

31 to 33

13

Match 8

34 to 38

lazy

Match 9

39 to 43

dogs

Presets

Character Classes

Character classes match a single character from a defined set. These are the foundation.

Pattern Matches Example
. Any character except newline a.c matches 'abc', 'a1c', 'a-c'
\d Any digit (0-9) \d\d matches '42', '99'
\D Any non-digit \D+ matches 'hello'
\w Word character (letter, digit, underscore) \w+ matches 'hello_123'
\W Non-word character \W matches '@', ' ', '-'
\s Whitespace (space, tab, newline) \s+ matches ' '
\S Non-whitespace \S+ matches 'hello'
[abc] Any of a, b, or c [aeiou] matches vowels
[^abc] Any character except a, b, c [^0-9] matches non-digits
[a-z] Any character in range a to z [A-Za-z] matches any letter

The shorthands (\d, \w, \s) and their negations cover most needs. Square-bracket character classes handle everything else.

Quantifiers

Quantifiers control how many times a preceding element must appear.

  • * — zero or more. ab*c matches ac, abc, abbc.
  • + — one or more. ab+c matches abc, abbc, but not ac.
  • ? — zero or one (optional). colou?r matches color and colour.
  • {3} — exactly 3 times. \d{3} matches exactly three digits.
  • {2,5} — between 2 and 5 times.
  • {3,} — 3 or more times.

Greedy vs lazy

By default, quantifiers are greedy — they match as much as possible. Adding ? makes them lazy (match as little as possible). This matters enormously for parsing quoted strings or HTML.

Greedy: ".*"  applied to  he said "hello" and "goodbye"
Matches: "hello" and "goodbye"  (first quote to last quote)

Lazy:   ".*?" applied to  he said "hello" and "goodbye"
Matches: "hello" then "goodbye"  (shortest possible matches)

Anchors

  • ^ — start of string (or start of line in multiline mode).
  • $ — end of string (or end of line in multiline mode).
  • \b — word boundary. \bcat\b matches cat but not caterpillar or concatenate.

Word boundaries prevent partial matches

Searching for log without boundaries matches blog, catalog, logarithm, and log. Use \blog\b to match only the standalone word. One of the most useful and underused regex features.

Groups and Alternation

Capturing groups

Parentheses () extract matched substrings. (\d{4})-(\d{2})-(\d{2}) applied to 2026-03-31 captures 2026, 03, 31.

Named groups

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) lets you reference matches by name instead of index. Far more readable in complex patterns.

Non-capturing groups

(?:...) groups elements without capturing. Useful when you need grouping for alternation or quantifiers but do not need the match. (?:https?|ftp):// groups protocol options without wasting a capture slot.

Alternation

| means “or”. cat|dog matches either. Use with groups: (cat|dog) food matches cat food or dog food.

Lookaheads and Lookbehinds

Zero-width assertions — they check what comes before or after the current position without consuming characters.

  • (?=...) — positive lookahead. \d+(?= dollars) matches 100 in 100 dollars but not 100 euros.
  • (?!...) — negative lookahead. \d+(?! dollars) matches 100 in 100 euros but not 100 dollars.
  • (?<=...) — positive lookbehind. (?<=\$)\d+ matches 50 in $50.
  • (?<!...) — negative lookbehind. (?<!\$)\d+ matches 50 in 50 items but not $50.

Lookbehind compatibility

Lookbehinds work in JavaScript (ES2018+), Python, Java, C#, and most modern engines. Not supported in some older environments. If you need broad compatibility, restructure using lookaheads or capturing groups.

10 Practical Patterns

Paste any of these into the Regex Tester to see them work.

1. Email address (simplified)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Covers the vast majority of real email addresses. The full RFC 5322 spec is absurdly complex and not worth implementing in regex.

2. URL

https?:\/\/[^\s/$.?#].[^\s]*

Matches HTTP and HTTPS URLs. For strict validation, use your language’s URL parser.

3. UK phone number

^(?:0|\+44)\d{9,10}$

Numbers starting with 0 or +44 followed by 9-10 digits.

4. Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Validates format, restricts months to 01-12, days to 01-31. Does not check whether February 30th exists — use a date library for that.

5. IPv4 address

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

Validates each octet is 0-255.

6. Hex colour code

^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

Matches #fff and #1a2b3c.

7. Strong password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

At least 8 characters with one lowercase, one uppercase, one digit, one special character. Four positive lookaheads check each requirement independently.

8. HTML tag

<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>

Matches opening and closing tags. \1 backreference ensures they match. Fine for quick extraction — do not use regex to parse HTML in production.

9. Trailing whitespace

[ \t]+$

Clean up code files. In multiline mode, matches whitespace at the end of each line.

10. Duplicate words

\b(\w+)\s+\1\b

Catches repeated words like “the the” or “is is”. The \1 backreference matches whatever the first group captured.

Common Mistakes

Forgetting to escape special characters. . matches any character, not a literal dot. Use \. for a period. Same for (, ), [, ], {, }, +, *, ?, ^, $, |, \.

Greedy when you need lazy. ".*" matches from the first quote to the last quote in the entire string. Use ".*?" for the nearest closing quote.

Anchoring only one end. ^\d+ checks that the string starts with digits but says nothing about what follows. ^\d+$ ensures the entire string is digits.

Over-engineering validation. Regex matches format. It does not validate semantics. Match the pattern with regex, then validate the logic in code.

Catastrophic backtracking. Nested quantifiers like (a+)+ cause exponential backtracking on almost-matching inputs. This freezes your application. Never nest quantifiers inside groups that are themselves quantified.

Keep Going

Start with the Regex Tester and experiment against real data. Build confidence with character classes and quantifiers before tackling lookaheads. And when a pattern grows beyond two lines — stop, and write a proper parser instead.