What Is Regex? A Plain-English Guide to Regular Expressions
A regular expression (shortened to regex or regexp) is a sequence of characters that defines a search pattern. You write a compact expression — sometimes just a few characters, sometimes a longer formula — and the regex engine scans a piece of text and tells you which parts match. Most programming languages, text editors, and command-line tools support regex natively, making it one of the most portable and reusable skills a developer can have.
The simplest possible definition
Think of regex as a wildcard search on steroids. When you press Ctrl+F in a document editor, you can type "cat" and find every instance of that exact word. Regex lets you express far more flexible patterns: "find any word that starts with a capital letter," "find any sequence that looks like an email address," or "find any line that contains a number followed by a slash."
In most languages, a regex pattern is written between forward slashes or passed as a string. For example, in JavaScript:
/\d{3}-\d{4}/This pattern matches three digits, a hyphen, then four digits — the format of a US phone number suffix like
555-1234. The \d means "any digit," and the {3} means "exactly three of them."
The engine reads your pattern left to right, attempts to match it against the target string, and returns the positions (and optionally the captured text) of every match it finds.
The core building blocks of regex syntax
Regex syntax looks dense at first, but it is built from a small set of repeating ideas. Once you recognize these building blocks, most patterns become readable.
Literal characters
Any ordinary letter or digit in a regex matches itself exactly. The pattern cat matches the string "cat" anywhere in the text. This is the baseline — everything else is a modifier on top of literal matching.
Character classes
Square brackets define a set of characters, any one of which may match at that position. [aeiou] matches any single vowel. A dash inside brackets creates a range: [a-z] matches any lowercase letter, [0-9] matches any digit. A caret at the start negates the class: [^aeiou] matches any character that is not a vowel.
Several shorthand classes cover common cases:
| Shorthand | Meaning | Equivalent class |
|---|---|---|
\d | Any digit | [0-9] |
\D | Any non-digit | [^0-9] |
\w | Word character | [a-zA-Z0-9_] |
\W | Non-word character | [^a-zA-Z0-9_] |
\s | Whitespace | space, tab, newline… |
\S | Non-whitespace | anything but whitespace |
. | Any character (except newline) | almost everything |
Quantifiers
A quantifier placed after a character or group specifies how many times it must appear. The four you will use most:
*— zero or more times+— one or more times?— zero or one time (makes something optional){n,m}— between n and m times (e.g.,{2,4}means "two, three, or four")
By default, quantifiers are greedy — they match as much as possible. Adding a ? after the quantifier (e.g., +?) makes it lazy, matching as little as possible. The difference matters when your pattern contains multiple possible match points in the same line.
Anchors
Anchors do not match characters — they match positions. ^ matches the start of a line, $ matches the end. \b matches a word boundary (the transition between a word character and a non-word character). So \bcat\b matches "cat" as a standalone word, but not the "cat" inside "concatenate."
Groups and capture
Parentheses group part of a pattern and, by default, capture the matched text into a numbered slot. The pattern (\d{4})-(\d{2})-(\d{2}) matches a date like 2026-07-04 and captures the year in group 1, the month in group 2, and the day in group 3. You can then reference those captures in a replacement string using $1, $2, and $3.
Named groups make patterns more readable: (?<year>\d{4}) works the same as a numbered group but lets you reference the capture by name ($<year>) instead of by position.
If you need grouping without capturing, use (?:...). This keeps the pattern organized without consuming a capture slot.
Flags change how the engine behaves
Most regex engines accept optional flags that modify matching behavior globally. Common ones:
- g (global) — find all matches, not just the first one
- i (case-insensitive) — treat uppercase and lowercase letters as equivalent
- m (multiline) — make
^and$match the start and end of each line, not just the whole string - s (dotAll) — make the
.metacharacter match newline characters too
In JavaScript, flags follow the closing slash: /pattern/gi. In Python, they are passed as a second argument to re.compile().
A worked example: validating and extracting data
Suppose you have a log file and need to pull out every IPv4 address. A simplified but functional pattern is:
\b(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})\bBreaking it down:
\b — word boundary, so we do not match mid-number(\d{1,3}) — one to three digits, captured into a group\. — a literal dot (the backslash escapes the metacharacter meaning of .)Applied with the global flag to the string
"Connected from 192.168.1.10 and 10.0.0.1", this returns two matches: 192.168.1.10 and 10.0.0.1. Each match also exposes four capture groups holding the individual octets.
This pattern does not validate that each octet is in the range 0–255 — a full validation would need a more complex expression or a follow-up check in code. That is a common practical trade-off: regex is excellent at recognizing shape and structure, but logic that requires arithmetic is usually easier to handle in the surrounding code.
Where regex is used in practice
Regex shows up in a wide range of everyday developer tasks:
- Input validation — checking that an email address, phone number, or postal code has the right format before accepting a form submission
- Text search and replace — renaming variables across a codebase in an editor like VS Code, which supports regex find-and-replace
- Log parsing — extracting timestamps, error codes, or IP addresses from unstructured log output
- Data cleaning — stripping unwanted characters, normalizing whitespace, or reformatting dates in a dataset before analysis
- Routing — many web frameworks use regex patterns to match URL paths to handler functions
- Linting and code analysis — tools like ESLint use regex to detect patterns in source code
Common pitfalls to watch out for
Regex is powerful enough to cause real problems when used carelessly. A few things to keep in mind:
- Catastrophic backtracking. Certain patterns — particularly nested quantifiers like
(a+)+— can cause the engine to try an exponentially large number of paths when a match fails. On large inputs, this can hang a program. Test performance on realistic inputs, not just small examples. - Forgetting to escape metacharacters. Characters like
.,*,+,?,(,),[,{,\,^,$, and|all have special meaning. To match them literally, prefix each with a backslash:\.matches a real dot, not "any character." - Relying on regex for full HTML or JSON parsing. Regex handles flat patterns well, but recursive or nested structures (HTML tags, nested JSON) are not regular languages. Use a proper parser instead.
- Overcomplicating the pattern. A regex that is hard to read is hard to maintain. If a pattern exceeds about 60 characters without a clear structure, consider breaking the validation into multiple simpler checks in code.
Lookaheads and lookbehinds: matching context without consuming it
Lookahead and lookbehind assertions let you match based on what comes before or after a position, without including those surrounding characters in the match itself.
A positive lookahead (?=...) asserts that what follows must match. For example, \w+(?=:) matches a word only when it is immediately followed by a colon — useful for extracting keys from text like name: Alice without capturing the colon.
A negative lookahead (?!...) asserts that what follows must not match. \bfoo(?!bar)\b matches "foo" only when it is not followed by "bar."
Lookbehind works the same way in the reverse direction: (?<=\$)\d+ matches digits that are immediately preceded by a dollar sign, without including the sign in the match.
Test regex patterns right in your browser
Figro's free regex tester gives you real-time match highlighting, capture group details, replace mode, and a built-in cheat sheet — no login, no data sent to any server.
Open the free regex tester →Figro's guides are educational and independent. They are not financial, legal, or investment advice. Some pages include affiliate links; if you purchase through them we may earn a commission at no extra cost to you.