This week’s blog topic will be In Memory Row
Composition. It’s important to establish
a quick basic understanding of how In Memory Data is stored as a primer to
future blog entries.
Before going into the topic of this blog post (the primer
for future posts), let’s establish a quick primer on SQL Server’s traditional
disk based table storage. So a quick primer for the primer!
Data Pages
In traditional disk based tables, table data is stored on 8
KB pages within the database mdf file. At
the very high level, these pages contain header information (metadata about the
page), data rows and a row offset table.
Data rows are written sequentially to these pages. The 8 contiguous pages are stored in extents. This is the basic unit of space managed and
read in SQL Server. Some takeaways here. Rows are stored sequentially in the pages and
because reads are performed in 64kb blocks, similarly keyed data will be
retrieved together.
In Memory Row Composition
In memory data is stored directly in memory. There are no pages or extents. This fundamentally changes the way that data
may be read and stored. In Memory rows are
composed differently than disk based tables to allow optimization for byte
addressable memory. The data itself is
not stored in any particular order, in no necessary proximity to adjacent rows.
Additionally, In Memory employs are row versioning system (a later blog topic)
to eliminate the locking and blocking of data.
In order to overcome locking issues, in memory rows are never
updated. They are only created, read and
destroyed. SQL Server accomplishes all this using metadata for each row.
Each In Memory row is composed of two parts, a Row Header
and a Payload. The Payload is the data, all
the index keys and all the other columns in the table. The Row Header is where it gets
interesting.
As you can see in the image above the row header is divided into at least 5 sections. A Begin Timestamp, End Timestamp,
StatementID, Index Link Count and Index Pointers.
The timestamp columns describe the life of a row. When the row is initially created, the
BeginTS field is populated with the timestamp of the transaction that created
the record. At this point, the EndTS
field is populated with a special infinity value and updated with the timestamp
of the transaction that deletes the record.
Following these timestamp fields is the StatementID
field. This is the unique statement id
that created the row.
The next set of fields are the Index Link Count, the number
of indexes that reference this row, followed by an Index Pointer field for each
index in the table. The index data
structure in conjunction with these Index pointer fields link the data together
in a Memory Optimized table. This is how
data is accessed and order in In Memory tables.
As a result every Memory Optimized tables require at least one index.
This was a quick overview of how In Memory rows are composed. It opens the door to future discussions about In Memory Indexes, isolation levels and row versioning In the upcoming blogs we’ll look into how the row header
helps SQL Server achieve these various enhancements with In Memory tables.