Image by Quang Nguyen Vinh via pexels.com

Recently I came across the term Associative Array. For whatever reason, I had never heard of this term before, also, I can’t remember hearing this from my bachelor’s course as well. I asked a few of my friends, some of them had not heard about it as well…

I wanted to learn about what and how this term is defined and its usage wrt programming languages. These notes are a compilation of my readings online…

After a quick Google search, we would see that Hash Table is one of the concrete implementations of Associative array ADT. However, there are several interesting references about it online.


The Wikipedia article is quite amazing and a few important points are,

Definition

An associative arraymapsymbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection.

wiki

Operations

  • Insert or put: add a new (key, value) pair to the collection, mapping the key to its new value. Any existing mapping is overwritten. The arguments to this operation are the key and the value.
  • Remove or delete: remove a (key, value) pair from the collection, unmapping a given key from its value. The argument to this operation is the key.
  • Lookupfind, or get: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some lookup functions raise an exception, while others return a default value (zero, null, specific value passed to the constructor, …)
wiki

More generalization seems to support,

  • iteration: loop over mappings. In here order is left to the implementation.
  • multimap: multiple values to be associated with a single key, or duplication of key-value pair allowed
  • bidirectional map: lookup is possible with both key and reverse lookup with value.

Other references

As I tried to check the term’s references in online forums such as Stack Overflow, Quora, and Geeksforgeeks. I actually got confused about, whether an associative array is indexed both on an integer and a key (?) and whether is it supposed to be iterable in the insertion order.

Both of these properties are not required to be present as per the ADT definition. These properties can be part of specialized containers like Ordered-Map or Ordered-Dictionary.

Mainly I got confused because, in some online threads, the question wasn’t phrased correctly, and in some other threads the OP wasn’t being explicit about what they are trying to achieve. And there are some incorrect explanations too.

Let’s note them down simply to save confusion in the future.

This C++ thread seemed spot on,

Arrays are generally indexed by the position of the elements. A simple array – int x[10], has its elements x[0] ... x[9]. The index is an unsigned integral value.

The associative container means that the index can be, well, an arbitrary (not necessarily an unsigned integral) type (in this case, an std::string).

What is the meaning of associative array?

Some Stack overflow threads seem to be mentioned as if it is a multi-key map (?)

The first one in the above confused me. I started thinking should the ADT also require supporting both integer and key while indexing? No, in the SO discussion, they are using a PHP array feature, and asking how the same can be achieved in Java. I have written a separate section about it later as “PHP Arrays”. This was fun to check.

While others seem to be simply referring to populating the map or iterating over both (key-value) pair or simply values.

There are other places that I looked at as well, such as this GeeksforGeeks and Quora.

The GeekforGeeks article (Implementing Associate Array in Java) doesn’t seem to make much sense. As it seems to be highlighting the iterator part of the entry set. There can be some bad information out there too (maybe I need to pursue correction).

To this Quora question, Are associative array and hash table basically the same things?, most of the answers are correctly written, such as this one, very concise, and to the point.

The term “hash table” is talking about the implementation: a specific way to organize your data in memory.

The term “associative array” is talking about the interface: the concept that you use objects as array indices. (This is sometimes called an abstract data structure.)

There are multiple ways to implement an associative array. A hash table is one of them — in fact, it is by far the most common one.

Michal Forišek’s ans

However, some answers are quite confusing and only partially correct

You can implement an associative array using a hash table, but they are not the same, and neither are a subset of the other.

There are associative arrays which are not hash tables. E.g. std::map in C++ is an associative array implemented using a search tree.

There are hash tables that are not associative arrays. std::hash_set in C++ is a hash table, but not an associative array. Bloom filters use hash tables in a non-trivial way, so the hash tables there are not associative arrays.

Andrii Melnykov’s ans

This one confused me quite a bit, so I will dig into this answer further. Let’s just segregate correct and incorrect parts here,

Correct parts

You can implement an associative array using a hash table

There are associative arrays which are not hash tables. E.g. std::map in C++ is an associative array implemented using a search tree.

This is correct. The Associative Array is an ADT whereas a hash table is a concrete implementation. Also, the C++ std::map is an ordered map implemented using Red-Black Tree. We can note one thing here, C++ has another map data structure called std::unordered_map which is implemented using the hash table.

Incorrect parts

but they are not the same, and neither are a subset of the other.

There are hash tables that are not associative arrays. std::hash_set in C++ is a hash table, but not an associative array. Bloom filters use hash tables in a non-trivial way, so the hash tables there are not associative arrays.

This is actually incorrect. Hash tables are a concrete implementation of the operations described in the ADT Associative array. Hash tables happen to give the most optimal amortized run time of O(1) for all three operations of the Associative array; put, delete, and get, compared to other implementations using Search trees.

The other two things mentioned here, don’t make any sense; std:hash_set in C++ and Bloom filters, use hash_table, but not an associative array -> why does the set and bloom filter need to be even brought into discussion here? In fact, bloom filters don’t even use hash tables (Note: hash_table stores the key, bloom filters don’t even store the key).


PHP Arrays

Particularly the way the value was assigned to nested keys in this SO question was interesting ($arr[0]['name'] = 'demo';). There are two put operations in here. On the left side of the assignment operator, the put operation happened implicitly for the nested keys, and then on the right side, the put operation is happening explicitly in the presence of the assignment operator. This is different compared to languages like Java or Python.

Then I tried to check about how the PHP arrays are, it’s actually mind-blown for me,

An array in PHP is actually an ordered map. A map is a type that associates values to keys. This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more. As array values can be other arrays, trees and multidimensional arrays are also possible.

php manual

Back to the associative arrays, I tried the same SO example to inspect how the variable is,

php > $arr = array();
php > 
php > $arr[0]['name'] = 'demo';
php > $arr[0]['fname'] = 'fdemo';
php > $arr[1]['name'] = 'test';
php > $arr[1]['fname'] = 'fname';
php > 
php > var_dump($arr);
array(2) {
  [0]=>
  array(2) {
    ["name"]=>
    string(4) "demo"
    ["fname"]=>
    string(5) "fdemo"
  }
  [1]=>
  array(2) {
    ["name"]=>
    string(4) "test"
    ["fname"]=>
    string(5) "fname"
  }
}

Let’s see if it is flexible by putting more than two keys for the implicit put operation that PHP seems to be doing,

php > $arr2 = array();
php > $arr2['k0']['k00']['k000']['k0000'] = 'v0';
php > var_dump($arr2);
array(1) {
  ["k0"]=>
  array(1) {
    ["k00"]=>
    array(1) {
      ["k000"]=>
      array(1) {
        ["k0000"]=>
        string(2) "v0"
      }
    }
  }
}
php > # we can also access keys at different level
php > var_dump($arr2['k0']['k00']);
array(1) {
  ["k000"]=>
  array(1) {
    ["k0000"]=>
    string(2) "v0"
  }
}

Interesting!!


Key Takeaways

  1. Associative Array is an ADT(abstract data type), while a hash table is a concrete implementation of this ADT. This ADT’s implementations in programming languages are typically called Map (or Hash Map) and Dictionary. In C++, and Java it is called Map *, and in Python as Dictionary. This ADT can also be implemented using Trees (for ordered maps).
  2. PHP Array is quite strange as it is a hash table, and alone covers implements for array, list, stack queue, map, etc. PHP probably incurs a lot of performance penalties in doing so. Something to read about in the future…
  3. PHP associative array usage where assigning a value to a nested key, seems like magic happens on the left side of the assignment operator.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *