anexinet ISG Blog Site: Drive Capacity Math all work about the same:

For our purposes, we will start with a storage system that has 30 drives that are all 1 TB drives. It does not matter who the manufacturers of the storage or drives are the concepts are all the same.

The first thing usually listed about a storage system is the Raw Capacity of the system, this number is a function of how may physical drive the system support, times the capacity of the drives used. It is simply the number of drives time the drive capacity.

So for our example - 30 x 1 TB = 30TB

Now we start subtracting capacity for 3 main reasons, Hot Spares, RAID Type and Drive Right Sizing.

A general rule of thumb is that, you should have at least 1 Hot Spare per 20 drives. So in this case we would lose 2 drives for Hot Spares. So now we are left with only being able to write data to 28 drives.

The next thing we need to consider is the RAID Type and RAID Group Size, the RAID Group properties will vary by drive type and drive size, but for this example we will create 2 - 14 drive RAID Groups using RAID 6.

Note: We will use RAID6, because if a drive failure accrues the time it will take to rebuild the RAID Group can take well up to 72 hours for the SATA drives. The potential of a double fault occurring over that long of a period of time is too great a risk to allow the system to be in a fault state that long. A double fault means we lose the data permanently.

So for the example, we now lose another 4 drives for RAID considerations, which now means we only can write data to 24 drives. So, by just setting the system up we have lost 20% of its capacity.

Now we come to the concept of Drive Right Sizing.

In order for RAID to work all the drive in the RAID Group must be the same size down to the byte level. I know the manufacturers of the drive list the size in GB but in reality the drive may actually hold slightly more or less space than the listed GBs for the drive. So in order to deal with the drive size discrepancy issue, Right Sizing comes in to play. Depending upon the storage manufacturer the amount of space lost to Right Sizing is between 6% and 10% of the drives listed capacity. So for this example let’s just choose a middle number of 8%.

That means that a 1TB drive after Right Sizing has 920GB of actual writable capacity.

So now we have 24 drives at 920GB each leaving us with 22 TB of usable capacity. Without even writing data to the storage system we have lost 27% of the Raw Capacity just in set the system up.

Now that the storage system is set up we have to deal with the application and OS operation issues that arise as it writes data to the storage.

If we created one drive from all the storage available after setting the system up and filled it to 100% capacity, we could in essence store 22TB of data but that is not reality.

Best practices say you should not fill a drive much past 80% of its capacity. This is so an application/OS can write new files in continuous blocks thus minimizing disk fragmentation, which if allowed to happen can cause major performance issues to the OS, application and to the storage system.

So in our example, let’s be aggressive and say we will fill the drives to 85% of its usable capacity. We our left with an end capacity of 22TB x 85% or 18.7 TB for actual writable data and not knowingly cause any performance issues.

That means that after all the setup consideration and application concerns, we are left with approximately 62.3% of the Raw Capacity as writable for data use.

There are a few other minor things that will affect the end usable capacity by a percent or two in either direction, theses will depend upon the storage system manufacturer, but for the most part they will have minimal change on the overall outcome of being able to find out how much data we can store on a system without doing complex math.

So at the end of the day if you want a rough approximation of how much usable storage you can write date to, multiply the Raw Capacity by 60% or 65%. You will get a good approximation of the usable capacity. I generally go with the 60% number because it is a more conservative estimate and it takes into account some of the minor factors that are too hard to predict as we slice and dice the storage up.

As you can see, in this example we started with a system that had 30TB of Raw Capacity but after setting it up and parsing it out, in reality we can only store approximately 18 TB of data on it and still get acceptable performance from the storage system.