For our purposes, we will start with a storage system that
has 30 drives that are all 1 TB drives. It does not matter who the manufacturers
of the storage or drives are the concepts are all the same.
The first thing usually listed about a storage system is the Raw Capacity of
the system, this number is a function of how may physical drive the system
support, times the capacity of the drives used. It is simply the number
of drives time the drive capacity.
So for our example - 30 x 1 TB = 30TB
Now we start subtracting capacity for 3 main reasons, Hot
Spares, RAID Type and Drive Right Sizing.
A general rule of thumb is that, you should have at least 1
Hot Spare per 20 drives. So in this case
we would lose 2 drives for Hot Spares. So
now we are left with only being able to write data to 28 drives.
The next thing we need to consider is the RAID Type and RAID
Group Size, the RAID Group properties will vary by drive type and drive size,
but for this example we will create 2 - 14 drive RAID Groups using RAID 6.
Note: We will use RAID6,
because if a drive failure accrues the time it will take to rebuild the RAID
Group can take well up to 72 hours for the SATA drives. The potential of a double fault occurring
over that long of a period of time is too great a risk to allow the system to
be in a fault state that long. A
double fault means we lose the data permanently.
So for the example, we now lose another 4 drives for RAID considerations,
which now means we only can write data to 24 drives. So, by just
setting the system up we have lost 20% of its capacity.
Now we come to the concept of Drive Right Sizing.
In order for RAID to work all the drive in the RAID Group
must be the same size down to the byte level. I know the manufacturers of
the drive list the size in GB but in reality the drive may actually hold
slightly more or less space than the listed GBs for the drive. So in
order to deal with the drive size discrepancy issue, Right Sizing comes in to
play. Depending upon the storage manufacturer the amount of space
lost to Right Sizing is between 6% and 10% of the drives listed capacity.
So for this example let’s just choose a middle number of 8%.
That means that a 1TB drive after Right Sizing has 920GB of
actual writable capacity.
So now we have 24 drives at 920GB each leaving us with 22 TB
of usable capacity. Without even writing data to the storage system we
have lost 27% of the Raw Capacity just in set the system up.
Issue with application writing data to the storage
Now that the storage system is set up we have to deal with
the application and OS operation issues that arise as it writes data to the
storage.
If we created one drive from all the storage available after
setting the system up and filled it to 100% capacity, we could in essence store
22TB of data but that is not reality.
Best practices say you should not fill a drive much past 80%
of its capacity. This is so an application/OS can write new files in
continuous blocks thus minimizing disk fragmentation, which if allowed to
happen can cause major performance issues to the OS, application and to the
storage system.
So in our example, let’s be aggressive and say we will fill
the drives to 85% of its usable capacity. We our left with an end
capacity of 22TB x 85% or 18.7 TB for actual writable data and not knowingly cause
any performance issues.
That means that after all the setup consideration and
application concerns, we are left with approximately 62.3% of the Raw Capacity
as writable for data use.
There are a few other minor things that will affect the end
usable capacity by a percent or two in either direction, theses will depend upon
the storage system manufacturer, but for the most part they will have minimal
change on the overall outcome of being able to find out how much data we can
store on a system without doing complex math.
So at the end of the day if you want a rough approximation
of how much usable storage you can write date to, multiply the Raw Capacity by
60% or 65%. You will get a good
approximation of the usable capacity. I generally go with the 60% number
because it is a more conservative estimate and it takes into account some of
the minor factors that are too hard to predict as we slice and dice the storage
up.
As you can see, in this example we started with a system
that had 30TB of Raw Capacity but after setting it up and parsing it out, in
reality we can only store approximately 18 TB of data on it and still get
acceptable performance from the storage system.
And now you know, maybe more than you cared to know.