Hash Indexes

As the inherent structure of data and data storage is different for In-Memory data, the previous access methods no longer apply.  New the SQL Server 2014 are Hash indexes for Memory Optimized Tables. 

  These new indexes have a substantially different structure than typical B-Tree Indexes.   The hash index consists of an array of ‘buckets’.  The array is built by applying a deterministic hash function to an index key.  The same index key will always be mapped to the same hash bucket.  As we learned in the Row Composition blog, an in memory data row is composed of two sections, the header and the payload. Within the header section, there are Index pointers that link rows together.  The first entry in a particular bucket will have a pointer to the next entry and so on until there are no more entries.  All index keys that map to the same bucket will be linked together in this manner.

  Let’s take a look at how it all comes together.  The simplest way to understand how Hash indexes work is to simplify the hash function.  I’m borrowing from Kalen Delaney’s white paper here, but for simplicity’s sake, consider a hash index on a last name field and that the hash function simply counts the characters in a last name as a bucket.  The following image assumes an In-memory table with a hash index on a last name column.



















As Records are entered into the table, the hash is calculated and a memory pointer maps the index to the index pointer of the first row in each bucket, as represented by the second column in the rows on the right.  As more records are inserted, they are given pointers to the previous entry.  When retrieving rows using a hash index, the query applies the hash function to the search predicate to find the correct bucket, which maps to first row’s location in memory.  Every subsequent row is retrieved by traversing the row chaining created by the index pointer field in the row’s header. 

In-Memory tables also support nonclustered indexes. These are similar in structure to traditional non clustered indexes but have an updated structure and access method.  We’ll explore these changes in my next blog.