SSD vs HDD: Choosing the right tool for the job
We've featured solid-state drives (SSDs) in the past on this blog. However, given the recent situation in Thailand and increasing shortage of hard-disk drives, I thought it would be appropriate to feature a piece helping people who are designing computer systems decide whenit is a good idea to use SSDs rather the traditional hard drives (HDDs).*
The first two things that people usually think of when they consider SSDs are that they are both much faster and more expensive than HDDs at the same level of drive capacity. Their high price tag makes SSDs currently under-utilized in the market even though for some applications you'd be saving money by using them. Let's briefly look at a basic example.
Let's assume we're currently running the hardware configuration below:
Configuration 1
- 10x 2U servers
- 24 SAS drives
- $80,000 total cost ($36,000 drives + $40,000 servers + $4,000 rack space)
In the above example, let's assume we decided to go with SAS drives because our applications demand high IOPS but not necessarily much storage space (i.e. we're currently under-utilizingthe storage capacity). In such a scenario, it is much more cost-effective to have used SSDs rather than SAS drives in designing a configuration. Here is how the improved configuration would look:
Configuration 2
- 4x 1U servers
- 8 SSD drives
- $30,800 total cost ($5,000 drives + $25,000 servers + $800 rack space)
Configuration 2, which uses SSDs instead of HDDs, saves you 62% in purchasing costs over Configuration 1 (it will also save you money in operating costs because SSDs use less energy than spinning drives and there are also less total servers to power). This simple example illustrates some of the possibilities for hardware cost and energy consumption savings that SSDs promise, but deciding on whether to go with SSDs or HDDs is all about choosing the right tool for the job. Let's look at this challenge in more detail.
There are three key metrics (aside from price) to consider when determining whether SSDs are the smart option for your computing system. These are storage, IOPS, and workload requirements. Let's look at storage capacity first.
Storage Capacity
It's undeniable that if your applications require large quantities of storage space, SAS and SATA drives look much more appealing than SSDs to the pocketbook. As Chart A demonstrates, for the same GB capacity, an SSD drive is much pricier than a SAS drive (and the SSDs don't even come close to matching the capacity levels of SATA drives).
So, if I/O performance is not important and you require merely storage capacity, SATA or SAS drives are usually the best options. But what if your applications call for both manygigabytes and high I/O?
You can attain a solution for this situation by designing a hybrid system, a server that uses both SSDs and HDDs. Although this requires more in-depth knowledge of your applications' resource utilization to implement, such a solution can make your servers more efficient and save you money. By assigning some tasks (such as handling swap files and transaction logs) to SSDs and other storage tasks to HDDs, you can reduce the I/O bottleneck and at the same time allow for large quantities of data to be stored on the system.
There is another type of solution with hybrid drives, which works exceptionally well in storage systems where the need for drive read (but not write) performance is extremely high. This solution is based on LSI's CacheCade technology that uses SSDs as RAID controller cache. Read more about such an implementation here.
Once you've considered your storage requirements, the next factor to take into account is performance.
IOPS
IOPS stands for Input/Output Operations Per Second (IOPS), and it is a measure of drive performance. For HDDs, IOPS is largely determined by the quickness with which the rotating disk and the read/write head operate. Since SAS drives spin at speeds much higher than SATA drives (15,000 RPM vs. 7,200 RPM, respectively), you can frequently rule out SATA drives as an option if your applications require high IOPS.
SSDs can attain comparable or better IOPS levels as SAS drives for a much lower cost. So, when it comes to IOPS, the typical mantra that ""SSDs are too expensive"" is actually reversed. SSDs only seem expensive when comparing price per gigabytes (see Chart A); if you compare price per IOPS, SAS drives are the more costly option (see Chart B).
Chart B also shows FIO in addition to SSD and SAS drives. FIO stands for Fusion-io, a vendor of a unique and high-end type of I/O solution. FIO uses solid-state flash memory connected via a PCIe slot on the motherboard to achieve perhaps the highest levels of IOPS possible in a system (over 1,000,000 IOPS, in some cases). Such a solution is very pricey and is only recommended in extreme cases where the application calls for the highest performance available regardless of the cost (e.g. computational finance).
But IOPS is still only one part of the I/O equation. You still have to consider the workload requirements of your applications.
Sequential vs. Random I/O
I/O operations can be classified as one of two types: sequential or random. In HDDs, the most time consuming part of the I/O process is when the disk head seeks the disk cylinder to access the correct data. If it needs to access data that is scattered in many different places on the disk (i.e. random I/O workload), then HDD performance will be significantly slower. On the other hand, if the data to be read is located in the same general area of the disk cylinder (and the disk is not terribly fragmented), then HDDs can tackle such sequential I/O workloads rather well.
Applications that use sequential I/O operations include backup, archiving, and streaming video. Two common examples of random I/O are database servers and Microsoft Exchange servers.
Since SSDs don't have spinning parts, they perform random I/O operations better than hard disk drives. Chart C approximately demonstrates this workload to technology alignment.
Of course the devil is in the details, and you should first determine specifically what type of workload your applications will demand. As a general rule of thumb, SATA drives are sufficient for situations which primarily require sequential I/O, SSDs should be used when faster seek time is at a premium (random I/O), and SAS drives are a good solution when you expect a mixed workload.
Conclusion: Where SSDs excel
The best situations where you could use solid-state drives are ones where:
- The size of your data set is small,
- You require high IOPS, and
- Your I/O workload is mostly random operations.
There are other important factors to consider when deciding whether SSDs or HDDs are the right tool for the job. For instance, most types of enterprise-level SSDs have write longevity constraints over the course of their usable life. In addition, if you decide to reduce the number of servers you deploy by utilizing SSDs, you should make sure that scaling down your hardware does not create other bottlenecks in processing power, network I/O, or RAM.
If you have any other advice for those trying to decide between SSDs and HDDs when designing computing systems, please post in the comments below.Thanks for reading!
* This article (and the data mentioned) is based on an excellent presentation delivered by Douglas Bone on April 19, 2011.
SSD vs HDD: Choosing the right tool for the job
We've featured solid-state drives (SSDs) in the past on this blog. However, given the recent situation in Thailand and increasing shortage of hard-disk drives, I thought it would be appropriate to feature a piece helping people who are designing computer systems decide whenit is a good idea to use SSDs rather the traditional hard drives (HDDs).*
The first two things that people usually think of when they consider SSDs are that they are both much faster and more expensive than HDDs at the same level of drive capacity. Their high price tag makes SSDs currently under-utilized in the market even though for some applications you'd be saving money by using them. Let's briefly look at a basic example.
Let's assume we're currently running the hardware configuration below:
Configuration 1
- 10x 2U servers
- 24 SAS drives
- $80,000 total cost ($36,000 drives + $40,000 servers + $4,000 rack space)
In the above example, let's assume we decided to go with SAS drives because our applications demand high IOPS but not necessarily much storage space (i.e. we're currently under-utilizingthe storage capacity). In such a scenario, it is much more cost-effective to have used SSDs rather than SAS drives in designing a configuration. Here is how the improved configuration would look:
Configuration 2
- 4x 1U servers
- 8 SSD drives
- $30,800 total cost ($5,000 drives + $25,000 servers + $800 rack space)
Configuration 2, which uses SSDs instead of HDDs, saves you 62% in purchasing costs over Configuration 1 (it will also save you money in operating costs because SSDs use less energy than spinning drives and there are also less total servers to power). This simple example illustrates some of the possibilities for hardware cost and energy consumption savings that SSDs promise, but deciding on whether to go with SSDs or HDDs is all about choosing the right tool for the job. Let's look at this challenge in more detail.
There are three key metrics (aside from price) to consider when determining whether SSDs are the smart option for your computing system. These are storage, IOPS, and workload requirements. Let's look at storage capacity first.
Storage Capacity
It's undeniable that if your applications require large quantities of storage space, SAS and SATA drives look much more appealing than SSDs to the pocketbook. As Chart A demonstrates, for the same GB capacity, an SSD drive is much pricier than a SAS drive (and the SSDs don't even come close to matching the capacity levels of SATA drives).
So, if I/O performance is not important and you require merely storage capacity, SATA or SAS drives are usually the best options. But what if your applications call for both manygigabytes and high I/O?
You can attain a solution for this situation by designing a hybrid system, a server that uses both SSDs and HDDs. Although this requires more in-depth knowledge of your applications' resource utilization to implement, such a solution can make your servers more efficient and save you money. By assigning some tasks (such as handling swap files and transaction logs) to SSDs and other storage tasks to HDDs, you can reduce the I/O bottleneck and at the same time allow for large quantities of data to be stored on the system.
There is another type of solution with hybrid drives, which works exceptionally well in storage systems where the need for drive read (but not write) performance is extremely high. This solution is based on LSI's CacheCade technology that uses SSDs as RAID controller cache. Read more about such an implementation here.
Once you've considered your storage requirements, the next factor to take into account is performance.
IOPS
IOPS stands for Input/Output Operations Per Second (IOPS), and it is a measure of drive performance. For HDDs, IOPS is largely determined by the quickness with which the rotating disk and the read/write head operate. Since SAS drives spin at speeds much higher than SATA drives (15,000 RPM vs. 7,200 RPM, respectively), you can frequently rule out SATA drives as an option if your applications require high IOPS.
SSDs can attain comparable or better IOPS levels as SAS drives for a much lower cost. So, when it comes to IOPS, the typical mantra that ""SSDs are too expensive"" is actually reversed. SSDs only seem expensive when comparing price per gigabytes (see Chart A); if you compare price per IOPS, SAS drives are the more costly option (see Chart B).
Chart B also shows FIO in addition to SSD and SAS drives. FIO stands for Fusion-io, a vendor of a unique and high-end type of I/O solution. FIO uses solid-state flash memory connected via a PCIe slot on the motherboard to achieve perhaps the highest levels of IOPS possible in a system (over 1,000,000 IOPS, in some cases). Such a solution is very pricey and is only recommended in extreme cases where the application calls for the highest performance available regardless of the cost (e.g. computational finance).
But IOPS is still only one part of the I/O equation. You still have to consider the workload requirements of your applications.
Sequential vs. Random I/O
I/O operations can be classified as one of two types: sequential or random. In HDDs, the most time consuming part of the I/O process is when the disk head seeks the disk cylinder to access the correct data. If it needs to access data that is scattered in many different places on the disk (i.e. random I/O workload), then HDD performance will be significantly slower. On the other hand, if the data to be read is located in the same general area of the disk cylinder (and the disk is not terribly fragmented), then HDDs can tackle such sequential I/O workloads rather well.
Applications that use sequential I/O operations include backup, archiving, and streaming video. Two common examples of random I/O are database servers and Microsoft Exchange servers.
Since SSDs don't have spinning parts, they perform random I/O operations better than hard disk drives. Chart C approximately demonstrates this workload to technology alignment.
Of course the devil is in the details, and you should first determine specifically what type of workload your applications will demand. As a general rule of thumb, SATA drives are sufficient for situations which primarily require sequential I/O, SSDs should be used when faster seek time is at a premium (random I/O), and SAS drives are a good solution when you expect a mixed workload.
Conclusion: Where SSDs excel
The best situations where you could use solid-state drives are ones where:
- The size of your data set is small,
- You require high IOPS, and
- Your I/O workload is mostly random operations.
There are other important factors to consider when deciding whether SSDs or HDDs are the right tool for the job. For instance, most types of enterprise-level SSDs have write longevity constraints over the course of their usable life. In addition, if you decide to reduce the number of servers you deploy by utilizing SSDs, you should make sure that scaling down your hardware does not create other bottlenecks in processing power, network I/O, or RAM.
If you have any other advice for those trying to decide between SSDs and HDDs when designing computing systems, please post in the comments below.Thanks for reading!
* This article (and the data mentioned) is based on an excellent presentation delivered by Douglas Bone on April 19, 2011.