Explore Perl Learn Perl Part 2 Hash References Custom structures

10. Useful Hash applications

Hashes are very useful especially in specific list and array tasks. I'll show you some applications. Two of them are most interesting: counting and validating.

10.1 Frequency of notes of a tune.

Count the frequency of notes in a tune.


@notes = ();
$no_elements = 0;

sub round
{
  my ($value, $places) = @_;
  my $factor = 10**$places;
  return int($value * $factor + 0.5) / $factor; # +0.5 is magical sauce to do rounding instead of truncating
}

while ($line = <DATA>) 
{
  chomp($line); 
  $line =~ s/\h+/ /g;
  @notes= split(/ /, $line);
  $no_elements += scalar(@notes);
   
  foreach $note(@notes) 
  {
    $count{$note}++;
  }
}  

foreach $note (sort keys %count) 
{
    $perc = $count{$note} / $no_elements;
	
    print($note, " ", $count{$note}, " ", round($perc,2), "\n");
}

__DATA__
C E D E F G E
A G F E D E
G A B C B A G
E D C C B C

10.2 Frequency of grid's digits.

Count the frequency of each digit in the next grid.

$grid = [
    [1, 2, 3, 4, 5],
    [1, 2, 3, 2, 3, 4, 5, 6, 4],
    [4, 5, 6, 4, 3, 4, 5],
    [1, 2, 3, 4, 5],
    [1, 2, 3, 2, 3, 4, 5, 6, 4],
    [4, 5, 6, 4, 3, 4, 5]
];

%digit_freq = ();

foreach $row (@$grid) {
    foreach (@$row) { # process each digit of the grid
        $digit_freq{$_}++;
    }
}

print "Frequency of each digit:\n";
foreach $digit (sort keys %digit_freq) {
    print("$digit: $digit_freq{$digit}\n");
}

10.3 Frequency of grid's third column digits

Count the frequency of each digit in the third column of the next grid.

$grid = [
    [1, 2, 3, 4, 5],
    [1, 2, 3, 2, 3, 4, 5, 6, 4],
    [4, 5, 6, 4, 3, 4, 5],
    [1, 2, 3, 4, 5],
    [1, 2, 3, 2, 3, 4, 5, 6, 4],
    [4, 5, 6, 4, 3, 4, 5]
];

%digit_freq = ();

foreach $row (@$grid) {
    foreach ($row) {
        $digit_freq{$_->[2]}++; # process each digit of the third grid column
    }
}

print("Frequency of each digit in the third grid column:\n");
foreach $digit (sort keys %digit_freq) {
    print("$digit: $digit_freq{$digit}\n");
}

10.4 Validate input string.


$l_no1 = 1;
$h_no1 = 99;
$l_no2 = 1000;
$h_no2 = 1999;
	
$format = "number whitespace valid_string whitespace number";

# in the hash %valid, all keys are valid alphabetic strings (length 8) with value 1
%valid = (
'IihiANOS' => 1,
'JMinosXX' => 1,
);
 
sub valid_no
{	
  ($no, $lowest, $highest) = @_;
  ($lowest <= $no <= $highest) ? return 1 : return 0;
} 

print("Test if a input string has the required format: $format\n");

while (<DATA>) 
{
  chomp($_); 
# initial test if the input consists of '1-2 digits whitespace alphabetic_string(length 8) whitespace 1-4 digits'  
  if ($_ =~ /^(\d{1,2})\s{1}([a-zA-Z]{8})\s{1}(\d{1,4})$/) 
  {
# the data matched in the first set of parentheses is assigned to the variable $1 etc.
    $no_1 = $1;
    $str = $2;
    $no_2 = $3;
# defined(valid{$str}) checks if the string is contained in the hash and thus defined. 
    (valid_no($no_1, $l_no1, $h_no1) and defined($valid{$str}) and valid_no($no_2, $l_no2, $h_no2)) ? print("$_ is valid.\n") : print("$_ is not valid.\n");	
  }
  else
  {
	print("$_ has not the required format\n");
  }	   
}

__DATA__
ab 123456 abc
10 eztcUukk 410
9  CaSCvKEo 27
26 IihiANOS 1851
52 JMinosX 612
3  TOCgOwMQ 709

10.5 Remove duplicate members of a list or array.

Recall first that the special variable $_ gets the array members:

@abc = qw (a b c);
foreach (@abc) {
  print ($_ . " ");
}

The result is: a b c

The second thing you must realize that duplicate hash keys do not make sense:

%abc = (
'a' => '1',
'a' => '1',
'b' => '1',
'b' => '1',
'b' => '1',
'b' => '1',
'c' => '1',
'c' => '1',
'c' => '1'
);

$no_elements = keys (%abc);
print ($no_elements . "\n");

The result is 3

@keys = sort(keys(%abc));
print ("Keys of the hash %abc are: @keys\n");

The result is 'Keys of the hash %abc are: a b c'

So, if you assign each array member to a key of a hash (and value of hash to e.g. 1), then you've fixed the problem: the result are unique keys.

@abc = qw(b b c c c a a a);

foreach (@abc) {
   $abc_hash{$_} = 1;
}

@unique_array_members = sort(keys %abc_hash);
print ("@unique_array_members\n");

The result is: a b c

A shorter way via a hash-slice (see 1.2.1 Hash-slices):

@abc = qw(a a b b b c c);
%unique =();
undef @unique{ @abc };  # this sets the values of the hash to 'undef';
print (@unique_array_elements = sort(keys %unique)); # prints abc

10.5.1 Remove duplicate members of a list or array via 'map'.

A nice alternative to fill the hash, is the 'map' function:

%abc_hash = map {$_, 1} @abc;

The function 'map' evaluates -by definition- the expression between curly braces in a list context. This means that in fact the following list is returned by the function 'map':

%abc_hash = (('b', 1), ('b', 1), ('c', 1), ('c', 1), ('c', 1), ('a', 1), ('a', 1), ('a', 1));
foreach  (keys(%abc_hash)) {
  print ("$key => $abc_hash{$key}\n");
}

The result is: a b c

10.5.2 Remove duplicate members of a list or array via 'push' or 'grep'.

You can of course iterate the array with 'foreach' and add a member to a new array only the first time it is found.

To understand fully what the code is doing, let's have a look to some feature of the increment operator.

If the increment operator "++" is placed after a variable (postfix form), a value is first returned and after that the variable is incremented by one:

$x = undef;
print ($x++ . "\n"); # prints 0; 'undef' has been numifiedstrong>
print ($x); # prints 1

If the increment operator "++" is placed before (prefix form), it increments before returning the value

$j = undef;
print (++$j . "\n"); # prints 1

Now another solution to problem of removing duplicates from an array:

@abc = qw (a a b c c c c );
@unique = ();
%found = ();
foreach $member ( @abc ) {
    push @unique, $member if (! $found{ $member }++);
}

The first time '$member' equals "a", the hash member '$found{"a"}' has no value and is 'undef'. And because 'undef' it is False, '$member' is pushed, i.e. added to the array '@unique' (! is NOT). Then '$found{"a"}' is incremented: 'undef' is numified to 0 and incremented by one. '$found{"a"}' has now the value 1. The second time '$member' equals "a", '$found{"a"}' equals 1, i.e. True. So, 'a' is not added to the array '@unique' and '$found{"a"}' becomes the value 2.

It can be coded in a shorter way. Removing duplicates of an array is the same as filtering an array. In that case 'grep' is your friend.

@abc = qw (a a b c c c c );
@unique = ();
%found = ();
@unique = grep { ! $found{ $_ }++ } @abc;

10.6 Check if a list or array contains a value?

Often you'll want to check if a list or array contains a specific element. There are several approaches.

@abc = qw (a a b c c c c );
%is_found = ();
foreach (@abc) { $is_found{ $_ } = 1 } @abc;
print ("The character 'd' was not found in the array '@abc'\n") if (! $is_found{ 'd' })
print ("The character 'c' was found in the array '@abc'") if ( $is_found{ 'c' })

The 'grep' alternative, a bit slower and without using a hash, is:

@abc = qw(a a b b b c c);
@matches = grep $_ eq 'a', @abc;
(@matches) ? print "Found!" : print "Not found!";

or shorter:

(grep {/d/} @abc) ? print "Found!" : print "Not found!";

Advantage: in scalar context you get the number of matches:

print (scalar (grep /a/, @abc));