6.7 Regex Backreferences

A concise tutorial followed by a practical bug-hunting guide.


1. What a backreference is

A backreference matches the exact text captured earlier. A backreference lets you reuse text that was previously matched by a capture group inside the same regex. Backreferences turn regexes from pattern recognizers into pattern comparators.

my $s = "hello hello";

if ( $s =~ /(\w+)\s+\1/ ) {
    print "Duplicate word: '$1'\n"; # matches "hello hello"
}
A backreference matches the exact text captured earlier in the same pattern. In this example, (\w+) captures a word and \1 requires that the next word is exactly the same word.

A very practical use of a backreference is finding accidentally repeated words in text.
2. Capture order, not visual order

Capturing groups are numbered left to right, ignoring non-capturing groups.

my $s = "abab";

$s =~ /(a)(b)\1\2/; # matches "abab"
Again: capture groups are numbered left to right, based on the position of the opening parenthesis.
/(a)(b(c))/
# \1 = a
# \2 = bc
# \3 = c
Note: renumbering happens silently if you add or remove parentheses — a major source of bugs in large regexes.
3. Modern Backreference Syntax (\g{})
Traditional backreferences use \1, \2, etc. Perl also supports a safer and clearer syntax:

/(\w+)\s+\g{1}/
Why \g{} Is Better
  • Avoids ambiguity (\10 could mean group 10 or octal 010)
  • Required when concatenating text
  • Works uniformly with named and relative references
Best practice: use \g{} in new code.
4. Named Capture Groups and Named Backreferences
Instead of numbering, you can name your captures:

/(?<word>\w+)\s+\g{word}/
Benefits:
  • Self-documenting regexes
  • Resistant to renumbering bugs
  • Easier maintenance
You can also access named captures from Perl:

my $date = "2026-02-08";
if ($date =~ /(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/) {
    print "$+{d}/$+{m}/$+{y}"; # 08/02/2026
}
Another example:
my $name = "Maliepaard, Reinier";
$name =~ s/^(?<last>[^,]+),\s*(?<first>.+)$/$+{first} $+{last}/;
print $name; # Reinier Maliepaard
Named captures like first and last make it clear what each piece is, and you never have to remember whether $1 was the first name or the last name.
5. Relative Backreferences (\g{-1})
Relative backreferences refer to previous capture groups by position relative to the current one:

/(\w+)(\s+)\g{-2}/
Here, \g{-2} refers to (\w+).

When to use them
  • When composing regex fragments
  • When capture numbering may change
  • In reusable regex components
Relative references reduce fragility in complex expressions.
6. Backreferences in Substitutions
Backreferences are extremely powerful in substitutions:
my $text = "this this is is Perl";

$text =~ s/\b(\w+)\s+\g{1}\b/$1/g;
print $text;

Result: 'this is Perl'

You can also reuse captured text on the right‑hand side of a substitution.

my $name = "Maliepaard, Reinier";

$name =~ s/^([^,]+),\s*(.+)$/$2 $1/;

print "$name\n"; # Reinier Maliepaard
Here ([^,]+) captures the last name and (.+) captures the first name or names. In the replacement, $2 $1 reorders them, so "Maliepaard, Reinier" becomes "Reinier Maliepaard".
7. Common Pitfalls

7.1 Ambiguous Backreferences

/(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)/
\10 # ambiguous!
Use: \g{10}

7.2 Optional Groups May Be Undefined

/(a)?(b)\1/
If (a) doesn’t match, \1 is undefined and the regex fails.

7.3 Double-Quoted Strings Eat Backslashes

"(\\w+)\\s+\\1" # correct
"(\w+)\s+\1"    # wrong
Use single-quoted strings for regex literals whenever possible.

7.4. Non-capturing groups do not count
my $s = "foo foo";

$s =~ /(?:foo)\s+(foo)/;
print $1; # "foo"

(?:foo) is ignored for numbering.

7.5 Parentheses that appear later do not change \1
my $s = "hello hello";

$s =~ /(\w+)\s+\1(\w+)/;

\1 still refers to the first capture, even though another group appears later.


8. Practical Examples

8.1 Detect Repeated Words

/\b(\w+)\s+\g{1}\b/i
8.2 Simple Palindrome Check

/^(\w)(\w)?\g{1}$/
8.3 Matching paired characters (quotes)
my $s = '"Perl is fun"';

$s =~ /(["'])(.*?)\g{1}/;
print $2; # Perl is fun
Backreferences are also useful to check that opening and closing tags are the same.
my $html = "hello"; # hello between <b> and </b> tags

if ( $html =~ m{<([A-Za-z][A-Za-z0-9]*)>.*?} ) {
    print "Matched a pair of <$1> tags\n";
}
([A-Za-z][A-Za-z0-9]*) captures the tag name, and requires the closing tag to use exactly the same name. If the opening tag is "<b>", the closing tag must be "<b>.

Extract text between some tags:
my $html = "Hello world!"; # Hello between <b> and </b> tags
if ($html =~ /<(\w+)>(.*?)<\/\g{1}>/) {
    print "Tag: $1, Content: $2"; # Result: "Tag: b, Content: Hello"
}