It’s very common to see databases created with the default
settings and left as is. Due to the
different database workloads, these default settings are generic and often not
optimal. One best practice that you
should always employ is proactively sizing your database files for growth and
transactional activity. This is not
always a straight forward process as it requires knowledge of database objects
and transactional activity. However,
getting it right can really improve database performance. Let’s take a look.
For this example, we’re going to simulate a single large
transaction and view its effects on database file sizes and performance. We’re going to ignore some managerial and developmental best
practices to really emphasize the importance of proactively sizing
your database. In this example, we’re
going to create a database with the default file size and auto growth settings,
and insert 100 million records. This is
going cause the database to rely on auto growth to size the data and log files
for a database.
USE
[master]
GO
IF (SELECT DB_ID('FileSizeDB')) IS NOT NULL
BEGIN
ALTER DATABASE FileSizeDB
SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE [FileSizeDB]
END
GO
CREATE DATABASE [FileSizeDB]
GO
USE
[FileSizeDB]
GO
CREATE TABLE [dbo].[T1](
[c1] [int] NULL
) ON [PRIMARY]
GO
DBCC
SQLPERF('sys.dm_os_wait_stats'
, CLEAR);
GO
Here we’ve created the database and created a table with a
single integer column. Then we’re
clearing the server wait stats to get an accurate picture of what our insert
will be waiting on. These waits accumulate over time since the instance has
started. In order to get an accurate
idea of what the server is waiting on during a specific time period, we can
either clear these before running our insert or record before and after
values.
USE FileSizeDB
GO
set statistics
io on;
set statistics
time on;
WITH Tally (n) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) h(n)
)
insert into
T1
select n
FROM Tally
The insert statement. This executes and inserts 100,000,000
records into table t1. As this database is running on my laptop, this is going
to take a while.
Table
'T1'. Scan count 0, logical reads 100160771, physical reads 0, read-ahead reads
0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 254219 ms, elapsed time = 1014029 ms.
(100000000
row(s) affected)
When it finally finishes, we see the total execution time of
1014029 ms. That’s almost 17
minutes. We can now peak into wait
stats to see what’s accumulated since we cleared them:
WaitType |
Wait_Sec |
Wait Count |
Wait Percentage |
AvgWait_Sec |
PREEMPTIVE_OS_WRITEFILEGATHER |
644.86 |
65 |
83.16 |
9.9209 |
Its not surprising that our top wait stat is
PREEMPTIVE_OS_WRITEFILEGATHER. This
indicates that we have substantial wait times due to auto grow events. Our initial file sizes, based on the defaults
were 3264 KB ,816 KB for the data and log files respectively. After running
this singular transaction, the data file grew to 1.22GB and our log file grew
to 30.7GB! The total wait seconds to
grow these files was around 10 minutes.
The bulk of our entire processing time.
So what happens if we presize the database to these sizes? We’re going to drop and recreate the database, this time, we’re going to proactively grow the database to 1.5GB and the log to 35GB. Then we’re clearing our wait stats and reloading that table. The results of the second test are below.
Table 'T1'. Scan count 0, logical reads 100160771,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
SQL Server
Execution Times:
CPU time =
164484 ms, elapsed time = 221227 ms.
(100000000 row(s) affected)
This finishes after an elapsed
time of 3 minutes and 41 seconds. Pretty
extreme improvement, 458%. Not bad when SQL Server doesn’t have to pause
processing to grow files. This also goes
to show how bad of a practice shrinking database files can be, yet I still see
it being performed. It should also be
noted that file growth wait time on data files can be mitigated by granting
your SQL Server service account the ‘Perform Volume Maintenance’ group
policy. This allows instant file
initialization by bypassing a zeroing out process during file
modifications. It’s very important to
note that this only helps with data files, not log files. Therefore, it is crucial to pre-size your log
files and not shrink them.
So just how do you size your database files? Unfortunately, there's no easy answer here. You have to know what to expect in terms of growth and normal, abnormal transactional activity in the database. Are there large transactions that run frequently? Are there very large transactions that run infrequently? What about index maintenance? Rebuilding large indexes, etc. You really need to know the workings of the database.
There are a few guidelines that you can use if you really don't know what to expect. Start by looking at historic growth rates. If you're not tracking this, you can look at historic full backup sizes. Full backups only backup pages with data and, after some time of regular backups, you can trend growth. Size your data files according to a few months growth. Transaction logs, look at the largest index in the database. Index rebuilds are single transactions, your transaction log will need to accommodate the index size. Start here and monitor your environment. Set up alerts to tell you when file sizes are close to being maxed out. You should absolutely set auto grow on for your files, but do not rely on it. These are meant to emergency processes in case your files fill up before you can manually grow them. Follow these guidelines and you can definitely reduce wait times for auto grow events.
I mentioned earlier that during these examples, we’re
ignoring certain best practices to show the effect that proactively sizing data
files can have on performance. We’ve
ignored one of the first best practices that nearly every new database administrator or accidental DBA learns: proper transaction log maintenance. We've also ignored transaction
management. We’re going to explore the
best practices for these departments in another blog.
Labels: Database File Size, SQL Server, SQL Server Best Practices