6.4 Grep

The function grep have much in common with the function map:

  • it applies an expression on each element of an array
  • it has a hidden loop and all array elements are assigned to the special variable $_
  • it returns a list, which can be saved into an array
The most important difference is the returned list only contains the values for which the expression evaluated to TRUE (in other words: the values that match the expression).

Example 1: filter numbers greater than 50

@arr = qw( 10 2 100 3 89 53 );
@arr_new = grep { $_ > 50 } @arr; 
print("@arr_new"); # output: 100 89 53
Shorter code, only valid if $_ will not be changed (see Example 3b):

print( grep { $_ > 50 } qw( 10 2 100 3 89 53 ) ); # output: 1008953
Use join to separate the values:

print(join (" ", grep { $_ > 50 } qw( 10 2 100 3 89 53 ) )); # output: 100 89 53
In the next examples I leave join out.

If grep is evaluated in scalar context, only the number of values in the returned list will be shown (Example 2b).

Example 2a: filter strings with length lower than 7

print( grep { length($_) < 7 } qw( sebastian daniel florence kirsten bibian niki ) ); # output: daniel bibian niki
Example 2b:

$n = grep { length($_) < 7 } qw( sebastian daniel florence kirsten bibian niki );
print($n); # output: 3
Example 3a: filter elements that are numbers

@arr = qw( 1 a 22 b 333 c 4444 d );
print(grep { /\d+/ } @arr); # output: 1 22 333 4444
Example 3b: filter elements that are not numbers

@arr = qw( 1a 2b 3c 4d );
print(grep { $_ if(s/\d+//); } @arr); # output: a b c d
Example 4: filter elements that are older than 365 days

@files = glob "*.pl";
@older_than_one_year_files = grep { -M $_ > 365 } @files;
print("@older_than_one_year_files"); # output shows files that are older than 365 days
Example 5a: filter elements that contains four characters

print( grep { /^....$/ } qw( sebastian daniel florence kirsten bibian niki ) ); # output: niki
Example 5b: filter elements that start with the character s or k

print( grep { /^[s,k]/ } qw( sebastian daniel florence kirsten bibian niki ) ); # output: sebastian kirsten
Example 5c: filter elements that end with the character n or i

print( grep { /[n,i]$/ } qw( sebastian daniel florence kirsten bibian niki ) ); # output: sebastian kirsten bibian niki
Example 5d: filter elements that start and end with a

print( grep { /^a.*a$/i } qw( ivori enrique anna edgar ) ); # output: anna
Example 5e: filter elements that start with an e and end with an r

print( grep { /^e.*r$/i } qw( ivori enrique anna edgar ) ); # output: edgar
Example 5f: filter elements that start with an e or i and end with a r or i

@res = grep { /^[e,i].*[i,r]$/i } qw( ivori enrique anna edgar ); 
print("@res\n"); # output: ivori edgar
Example 6a: filter unique values

@unique = grep { ( $found{$_}++ == 0 ) } qw( 1 2 1 5 5 5 3 4 5 3 2 3 1 );
print("@unique\n"); # output: 1 2 3 4 5
The same result with:

@unique = grep { ( ++$found{$_} == 1 ) } qw( 1 2 1 5 5 5 3 4 5 3 2 3 1 );
Example 6b: filter the values that occur 3 or more times

@freq_three_or_more = grep { ( (++$found{$_}) == 3) } qw( 1 2 1 5 5 5 3 4 5 3 2 3 1 );
print "@freq_three_or_more\n\n"; # output: 5 3 1
Frequencies:

foreach $item (sort keys %found) 
{
    print($item, " ", $found{$item}, "\n");
}
Output:
1 3
2 2
3 3
4 1
5 4


The number 5 with its frequency 4 is included. Question now is: how to fetch the numbers that appears exactly 3 times?

Example 6c: filter the values that occur 3 times exactly

@freq_three_times = grep { $found{$_} == 3 } grep { ( ++$found{$_} == 3 ) } qw( 1 2 1 5 5 5 3 4 5 3 2 3 1 );
print "@freq_three_times\n\n"; # output: 3 1
So the solution is a nested grep: the second grep 'grep { ( $found{$_} == 3 ) }' processes the returned list of the first grep 'grep { ( ++$found{$_} == 3 ) }'. Example 6a shows that the second grep processes a list with the elements 5 3 1. The second grep 'grep { ( $found{$_} == 3 ) }' checks if the values of $found{5}, $found{3} and $found{1} equal 3.

Example 7: find all integers and add them together

@lines = <DATA>;
foreach $line (@lines) {

 $line_extr = substr($line, 0, -2); # $line_extr equals $line without \n and dot
 @items = split(/\s+/, $line_extr); # split on at least one whitespace character
 @numbers = grep { /^\d+$/ } @items; # filter all integers
 foreach $number (@numbers) {
   $sum += $number; # add all integers together
 }  	 

}
print("Total of all numbers: $sum\n"); # Total of all numbers: 45

__DATA__
this is 1 and this is 2 and this is 3.
this is 4 and this is 5 and this is 6.
this is 7 and this is 8 and this is 9.
Example 8: find a string in a file

$file = "test.txt";
open(FH, '<', $file) or die $!;

if (grep { /your_search_string/ } <FH>) {
   print "your_search_string found\n";
}
else {
   print "your_search_string not found\n";
}
close(FH);