Explore Perl Learn Perl Must-knows Functions that always return

6.7 Regex Backreferences

A concise tutorial followed by a practical bug-hunting guide.

1. What a backreference is

A backreference matches the exact text captured earlier. A backreference lets you reuse text that was previously matched by a capture group inside the same regex. Backreferences turn regexes from pattern recognizers into pattern comparators.

my $s = "hello hello";

if ( $s =~ /(\w+)\s+\1/ ) {
    print "Duplicate word: '$1'\n"; # matches "hello hello"
}

A backreference matches the exact text captured earlier in the same pattern. In this example, (\w+) captures a word and \1 requires that the next word is exactly the same word.

A very practical use of a backreference is finding accidentally repeated words in text.

2. Capture order, not visual order

Capturing groups are numbered left to right, ignoring non-capturing groups.

my $s = "abab";

$s =~ /(a)(b)\1\2/; # matches "abab"

Again: capture groups are numbered left to right, based on the position of the opening parenthesis.

/(a)(b(c))/
# \1 = a
# \2 = bc
# \3 = c

Note: renumbering happens silently if you add or remove parentheses — a major source of bugs in large regexes.

3. Modern Backreference Syntax (\g{})

Traditional backreferences use \1, \2, etc. Perl also supports a safer and clearer syntax:


/(\w+)\s+\g{1}/

Why \g{} Is Better

Avoids ambiguity (\10 could mean group 10 or octal 010)
Required when concatenating text
Works uniformly with named and relative references

Best practice: use \g{} in new code.

4. Named Capture Groups and Named Backreferences

Instead of numbering, you can name your captures:


/(?<word>\w+)\s+\g{word}/

Benefits:

Self-documenting regexes
Resistant to renumbering bugs
Easier maintenance

You can also access named captures from Perl:


my $date = "2026-02-08";
if ($date =~ /(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/) {
    print "$+{d}/$+{m}/$+{y}"; # 08/02/2026
}

Another example:

my $name = "Maliepaard, Reinier";
$name =~ s/^(?<last>[^,]+),\s*(?<first>.+)$/$+{first} $+{last}/;
print $name; # Reinier Maliepaard

Named captures like first and last make it clear what each piece is, and you never have to remember whether $1 was the first name or the last name.

5. Relative Backreferences (\g{-1})

Relative backreferences refer to previous capture groups by position relative to the current one:


/(\w+)(\s+)\g{-2}/

Here, \g{-2} refers to (\w+).

When to use them

When composing regex fragments
When capture numbering may change
In reusable regex components

Relative references reduce fragility in complex expressions.

6. Backreferences in Substitutions

Backreferences are extremely powerful in substitutions:

my $text = "this this is is Perl";

$text =~ s/\b(\w+)\s+\g{1}\b/$1/g;
print $text;

Result: 'this is Perl'

You can also reuse captured text on the right‑hand side of a substitution.


my $name = "Maliepaard, Reinier";

$name =~ s/^([^,]+),\s*(.+)$/$2 $1/;

print "$name\n"; # Reinier Maliepaard

Here ([^,]+) captures the last name and (.+) captures the first name or names. In the replacement, $2 $1 reorders them, so "Maliepaard, Reinier" becomes "Reinier Maliepaard".

7. Common Pitfalls

7.1 Ambiguous Backreferences


/(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)/
\10 # ambiguous!

Use: \g{10}

7.2 Optional Groups May Be Undefined


/(a)?(b)\1/

If (a) doesn’t match, \1 is undefined and the regex fails.

7.3 Double-Quoted Strings Eat Backslashes


"(\\w+)\\s+\\1" # correct
"(\w+)\s+\1"    # wrong

Use single-quoted strings for regex literals whenever possible.

7.4. Non-capturing groups do not count

my $s = "foo foo";

$s =~ /(?:foo)\s+(foo)/;
print $1; # "foo"

(?:foo) is ignored for numbering.

7.5 Parentheses that appear later do not change \1

my $s = "hello hello";

$s =~ /(\w+)\s+\1(\w+)/;

\1 still refers to the first capture, even though another group appears later.

8. Practical Examples

8.1 Detect Repeated Words


/\b(\w+)\s+\g{1}\b/i

8.2 Simple Palindrome Check


/^(\w)(\w)?\g{1}$/

8.3 Matching paired characters (quotes)

my $s = '"Perl is fun"';

$s =~ /(["'])(.*?)\g{1}/;
print $2; # Perl is fun

Backreferences are also useful to check that opening and closing tags are the same.

my $html = "hello"; # hello between <b> and </b> tags

if ( $html =~ m{<([A-Za-z][A-Za-z0-9]*)>.*?} ) {
    print "Matched a pair of <$1> tags\n";
}

([A-Za-z][A-Za-z0-9]*) captures the tag name, and requires the closing tag to use exactly the same name. If the opening tag is "<b>", the closing tag must be "<b>.

Extract text between some tags:

my $html = "Hello world!"; # Hello between <b> and </b> tags
if ($html =~ /<(\w+)>(.*?)<\/\g{1}>/) {
    print "Tag: $1, Content: $2"; # Result: "Tag: b, Content: Hello"
}