Wednesday, July 23, 2008

Finally! A Simple Metric for Solid State Drive Endurance

We've all seen the impressive MTBF (mean time between failures) specifications of SSDs versus HDDs of one to two million hours versus 600 thousand hours in a notebook pc class application. However, unlike HDDs where a mechanical failure can make the entire HDD unusable, the main failure mechanism in a SSD relates to the memory cells becoming unusable. MTBF is a statistical calculation that unfortunately, does not capture the effect of write endurance.

The write endurance of a SSD is a function of the type of flash memory used (SLC, MLC), storage capacity, frequency of writes, data sizes of the writes and the amount of static data which relates to the wear leveling algorithm. System level endurance of SSDs in the industrial, enterprise and miliary space have ranged anywhere from one million to five million program/erase cycles. There are three main problems with this.

1. The endurance will depend on how the SSD is used in the application. As a result, SSD vendors can tweak the parameters for their own marketing purposes.

2. Not all SSD vendors use the same formula for calculating the endurance and the formulas can be overly complex.

3. A straight endurance metric is nebulous and difficult to grasp its implications, especially for OEMs who've never had to grapple with endurance issues in HDDs. For example: What's the effect on the lifetime of the drive?

SanDisk aims to solve this with the Longterm Data Endurance (LDE) metric. LDE is simply defined as the total amount of data writes allowed in the lifespan of the SSD. The metric is based on the Bapco write usage pattern for a typical business user and assumes the data is written equally over the lifetime of the drive and that data is retained for one year once the LDE specification is reached.

LDE allows OEMs a simple way to compare SSDs and determine, based on the applications usage patterns which drives are suitable for a particular application. For example, a drive with an 80TBW (teraByte write) LDE can support 20GB writes per day for 10 years (equivalent to 73TBW). For an application requiring support for only half the number of writes per day (10GB), a 40TBW rated drive would be sufficient.

The beauty of LDE is that it captures endurance in one single, understandable figure. A common metric is necessary to facilitate SSD adoption moving forward. Now comes the hard part: garnering support from other SSD vendors and OEMs.

I've uploaded a copy of Don Barnetson's presentation on LDE, "Solid State Drives: The MLC Challenge" on my website at http://www.forward-insights.com/.

3 comments:

Anonymous said...

Does the LDE metric assume perfet wear leveling?

Is the LDE metric adjusted for ECC capability?

Gregory Wong said...

It is a system-based metric which means it includes whatever wear-leveling algorithm and ECC is used to obtain the LDE.

Anonymous said...

I attended Mr. Barnetson's presentation and came out of it with a couple of simple, but telling concepts.

The real complexity in write endurance life is the disparity between the applications average block size and the underlying Flash block size. In this case, we are talking about the 48% of writes that are 4K in size being mapped onto a Flash erase block that is several megabytes in size. It is this magnitude of this disparity that makes analyzing expected lifespans difficult.

Mr. Barnetson also stated that while Flash controller designs are in works that will improve the write performance with small blocks, these designs will not improve the wear patterns. My take on this is that improved performance will actually wear out devices faster.

The goal of the spec is to take the Bapco "usage pattern" for a "typical" user and have each drive manufacturer calculate how that pattern will map onto their drive creating a "total write volume" for the life of the drive. This is a useful number. Hopefully the manufacturers will go further and state the details of how they got their numbers so that other usage patterns can be calculated.

In theory, other "usage patterns" should be created for other applications like MP3 players, mail servers, etc. The likelyhood of this happening seems slim though.

I would also like to point out that other system elements will impact the usage patterns. A "typical windows users" will have different IO distributions on an XP system on FAT32 then they will on NTFS.

In general, I think what is happening is that for most drives, the "typical user" will see a total lifespan that is about 1/200th the lifespan of the drive is you feed it "perfectly". Perfectly feeding a drive requires that you 100% linear write, with only a single thread, writing with exact erase block sizes and aligned exactly on erase block boundaries. Only in this case can you get the current manufacturers "50 GB/day for 20 years" durability.

My interest in this discussion has to do with my companies MFT (Managed Flash Technology) mapping layer. By re-organizing the order of blocks on the disk, MFT does actually write to Flash in the "perfect pattern". As would be expected, MFT does not actually get to a perfect 1:1 ratio, but it does get close. MFT has background re-org tasks that eventually move data around a couple of times, so with MFT ratio is closer to 3:1 in real-world testing of live servers. In that this is about 75x better than the bare drive, this allows you to use mid-endurance MLC drives in medium to high churn server applications. Our rule of thumb is that you can write the capacity of the drive daily and get about 7 years of life. Thus a 1 TB array of MLC drives can get 1 TB of writes daily for 7 years yielding a 2.5 PB "write lifespan". My company actually shipped it's first 1TB MLC server last week.

Doug Dumitru
EasyCo LLC
http://easyco.com