SQL Server Best Practices: Proactively Sizing Database Files

It’s very common to see databases created with the default settings and left as is.  Due to the different database workloads, these default settings are generic and often not optimal.  One best practice that you should always employ is proactively sizing your database files for growth and transactional activity.  This is not always a straight forward process as it requires knowledge of database objects and transactional activity.  However, getting it right can really improve database performance.  Let’s take a look.

For this example, we’re going to simulate a single large transaction and view its effects on database file sizes and performance.  We’re going to ignore some managerial and developmental best practices to really emphasize the importance of proactively sizing your database.  In this example, we’re going to create a database with the default file size and auto growth settings, and insert 100 million records.  This is going cause the database to rely on auto growth to size the data and log files for a database.

USE [master]
GO
IF (SELECT DB_ID('FileSizeDB')) IS NOT NULL
BEGIN
       ALTER DATABASE FileSizeDB
       SET SINGLE_USER WITH ROLLBACK IMMEDIATE
       DROP DATABASE [FileSizeDB]
END
GO
CREATE DATABASE [FileSizeDB]
GO
USE [FileSizeDB]
GO
CREATE TABLE [dbo].[T1](
       [c1] [int] NULL
) ON [PRIMARY]

GO
DBCC SQLPERF('sys.dm_os_wait_stats' , CLEAR);
GO
   
Here we’ve created the database and created a table with a single integer column.  Then we’re clearing the server wait stats to get an accurate picture of what our insert will be waiting on. These waits accumulate over time since the instance has started.  In order to get an accurate idea of what the server is waiting on during a specific time period, we can either clear these before running our insert or record before and after values. 

USE FileSizeDB
GO
set statistics io on;
set statistics time on;

WITH Tally (n) AS
(
       SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
       FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g(n)
       CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) h(n)
)
insert into T1
select n
FROM Tally

The insert statement. This executes and inserts 100,000,000 records into table t1. As this database is running on my laptop, this is going to take a while. 

Table 'T1'. Scan count 0, logical reads 100160771, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 254219 ms,  elapsed time = 1014029 ms.


(100000000 row(s) affected)

When it finally finishes, we see the total execution time of 1014029 ms.  That’s almost 17 minutes.   We can now peak into wait stats to see what’s accumulated since we cleared them:

WaitType Wait_Sec Wait Count Wait Percentage AvgWait_Sec
PREEMPTIVE_OS_WRITEFILEGATHER 644.86 65 83.16 9.9209


Its not surprising that our top wait stat is PREEMPTIVE_OS_WRITEFILEGATHER.  This indicates that we have substantial wait times due to auto grow events.  Our initial file sizes, based on the defaults were 3264 KB ,816 KB for the data and log files respectively. After running this singular transaction, the data file grew to 1.22GB and our log file grew to 30.7GB!  The total wait seconds to grow these files was around 10 minutes.  The bulk of our entire processing time.

So what happens if we presize the database to these sizes?  We’re going to drop and recreate the database, this time, we’re going to proactively grow the database to 1.5GB and the log to 35GB.  Then we’re clearing our wait stats and reloading that table.  The results of the second test are below.

Table 'T1'. Scan count 0, logical reads 100160771, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 164484 ms,  elapsed time = 221227 ms.

(100000000 row(s) affected)

This finishes after an elapsed time of 3 minutes and 41 seconds.  Pretty extreme improvement, 458%. Not bad when SQL Server doesn’t have to pause processing to grow files.  This also goes to show how bad of a practice shrinking database files can be, yet I still see it being performed.  It should also be noted that file growth wait time on data files can be mitigated by granting your SQL Server service account the ‘Perform Volume Maintenance’ group policy.  This allows instant file initialization by bypassing a zeroing out process during file modifications.  It’s very important to note that this only helps with data files, not log files.  Therefore, it is crucial to pre-size your log files and not shrink them. 

So just how do you size your database files?  Unfortunately, there's no easy answer here.  You have to know what to expect in terms of growth and normal, abnormal transactional activity in the database.  Are there large transactions that run frequently?  Are there very large transactions that run infrequently?  What about index maintenance?  Rebuilding large indexes, etc.  You really need to know the workings of the database.

There are a few guidelines that you can use if you really don't know what to expect.  Start by looking at historic growth rates.  If you're not tracking this, you can look at historic full backup sizes.  Full backups only backup pages with data and, after some time of regular backups, you can trend growth.  Size your data files according to a few months growth.  Transaction logs, look at the largest index in the database.  Index rebuilds are single transactions, your transaction log will need to accommodate the index size. Start here and monitor your environment.  Set up alerts to tell you when file sizes are close to being maxed out. You should absolutely set auto grow on for your files, but do not rely on it.  These are meant to emergency processes in case your files fill up before you can manually grow them.  Follow these guidelines and you can definitely reduce wait times for auto grow events.

I mentioned earlier that during these examples, we’re ignoring certain best practices to show the effect that proactively sizing data files can have on performance.  We’ve ignored one of the first best practices that nearly every new database administrator or accidental DBA learns: proper transaction log maintenance.  We've also ignored transaction management.  We’re going to explore the best practices for these departments in another blog.

Labels: , ,