Real Words or Buzzwords?: Erasure Coding

Print Friendly, PDF & Email

This is the 35th article in the “Real Words or Buzzwords?” series about how real words become empty words and stifle technology progress.

By Ray Bernard, PSP, CHS-III


Why RAID drive arrays don’t work anymore for video storage, and why Erasure Coding does.

  • This is another article where much of what I’m saying is already known to storage-savvy IT folks, except perhaps for the historical hard drive price information. This is a very technical subject, so I’m going to address it at a high level where it’s easily understandable and provide links to deep-dive technical references for the technical readers that need or want that level of data, using a few references that are well-written, easy to follow and complete. (Not all links are technical; the inflation calculator is not.)

    RAID worked very well in a practical and financial sense when inexpensive hard drive capacities were small, and the cost was much higher for the largest capacity drives, which is where the RAID acronym originally came from: Redundant Array of Inexpensive Disks. Nowadays, we often see “Independent Disks” used to expand the RAID acronym, partly because the price for hard drives has dropped so significantly.

    Let’s check the cost and size of hard drives around 1980, compared to today. That’s when IBM introduced the first gigabyte hard drive. It was the size of a refrigerator, weighed about 550 pounds, and cost $40,000. Three years later, Seagate introduced the first 5.25-inch hard disk, with a capacity of 5 megabytes, intended for personal computer use.

    The Seagate drive was $1,500 ($4,603 in today’s dollars per the Consumer Price Index Inflation Calculator), which is $300 per MB in 1980 dollars. Today, a Seagate 10-terabyte 5.25-inch drive is $300 at Best Buy, which is $0.00003 (3 thousandths of a penny) per MB. Nowadays ALL hard drives are amazingly inexpensive compared to earlier generations of disk drives, so RAID was renamed Redundant Array of “Independent” Disks.

    RAID was incredibly important in the 1980s and 1990s, as hard drive failures were common. To keep data safe meant duplicating your data across two or three different drives, which meant in 1983 that keeping 5MB of data triple-safe would cost $4,500 in 1980 dollars. Of course, today we copy 5MB of data onto a 16 GB USB memory stick that cost $4.99 at Best Buy. What a difference.

    However, today we’re looking beyond megabytes when dealing with enterprise video data storage requirements, which are now measured in petabytes, not gigabytes or terabytes, and that’s why RAID can’t be our data savior anymore.

    The Problem with RAID

    The main problem with RAID for large capacity disk drives is rebuild time. The time to rebuild the disk array for a single disk loss under RAID 5 (tolerates a 1-disk loss) or RAID 6 (tolerates a 2-disk loss) for an array of 10-terabyte drives is measured in weeks, not hours. During that time, storage write performance is significantly degraded.

    That means that during the rebuild period, a VMS’s RAID disk storage system can’t handle all the data that the VMS must write to it. For an excellent presentation of the redundancy and performance aspects of RAID disk arrays, see the RAID Performance Considerations by Mirazon,  a premiere information technology services provider.

    JBOD and Entry-Level NVRs

    The other problem with RAID is that during the rebuild a RAID array is highly vulnerable. Lose another disk under RAID 5, or two more disks under RAID 6,  and ALL DATA in the RAID array is irrecoverable.

    Thus, most entry-level NVRs use JBOD storage (just a bunch of disks) without RAID. That way when they lose a single drive, they lose only that drive, not all the other data. Since for most security video systems recordings are retained for only 30 or 45 days, some customers would rather replace drives every two years and accept the risk of losing 30 days of some data if a drive fails. That way, due to drive capacity increasing and costs dropping, every two years they can affordably increase the storage.

    Many NVRs are not kept in temperature-controlled environments, and with disk prices what they are today, it’s affordable to swap out the NVR drives every year. That’s the about the same cost as a mirrored RAID array (not counting technician service time) but with low probability of loss and a yearly increase in capacity. One advantage of JBOD storage is that it doesn’t require a high level of IT expertise for configuration or drive replacement.

    For VMS deployments that have many sites with JBOD or RAID video storage, it is a smart thing to use a product like Viakoo Preemptive to monitor the entire video system from cameras to network components to recorders to hard drives for signs of impending failure. Drives, for example, can be replaced proactively, before failure, with no loss of data and only a small amount of recording downtime during the hard drive change.

    Failover Recording

    Lengthy RAID rebuilds prompt some organizations to use standby failover recording servers and let the original recording server sit idle, rebuilding but not recording, in case they need the video data on it. After the data retention period is over, they erase the video data and put the original recording server back in service. Not only does the video system have RAID redundant storage, it has hot redundant failover recorders because the disk rebuild time is so long. Such redundancy has costs: server equipment, software licensing, and electricity.

    RAID was designed for data systems that don’t have the read/write intensity that security video recording has. It was also designed during an era of low-capacity disk drives, when rebuild times were measured in minutes or at worst case hours, rather than weeks or months.

    That, along with the cost of highly-mirrored storage, is why Erasure Coding was developed.

    Erasure Coding

    Erasure coding is what Amazon S3, Google cloud, and Microsoft Azure use for their storage. Erasure Coding is an approach to efficient fault-tolerant data storage. “Efficient” for this discussion means how much of the disk drive capacity is available for storing data, and how little of it is required for redundancy. With RAID mirroring you have 50% efficiency if your storage array has one copy of each drive. If you have two copies of each drive, that’s 33.33% efficiency. When dealing with petabytes or exabytes of data storage, the costs for that kind of redundancy are high – including the costs for powering and managing the quantity of hard drives involved.

    Maintaining high uptime for the data storage system at the lowest cost is a key objective for cloud service providers. That’s why the big-name cloud storage companies claim data durability rates like 11 nines (99.999999999%) or more, as Amazon’s Chief Technology Officer did in this blog post. According to Backblaze, a cloud storage company, conceptually, with a storage service providing 11 nines of reliability, if you store 1 million objects for 10 million years, you would expect to lose 1 file. If you are not familiar with the use of “nines” to describe system reliability, see my Real Words or Buzzwords article titled Five-Nines.

    Erasure coding is the method used to establish this kind of data durability at reasonable costs. Erasure coding can provide storage efficiencies for large drive arrays of 80% to 90% and higher. This is explained in a 155-page Technical Report (No. UCB/EECS-2016-155) from the University of California at Berkeley’s technical report titled, “Erasure Coding for Big-data Systems: Theory and Practice.”

    The paper’s introduction explains the rationale for Erasure Coding in the following two paragraphs.

    A typical approach for introducing redundancy in distributed storage systems has been to replicate the data, that is, to store multiple copies of the data on distinct servers spread across different failure domains [Editor: different physical and/or logical storage locations]. While the simplicity of the replication strategy is appealing, the rapid growth in the amount of data needing to be stored has made storing multiple copies of the data an expensive solution. The volume of data needing to be stored is growing at a rapid rate, surpassing the efficiency rate corresponding to Moore’s law for storage devices. Thus, in spite of the continuing decline in the cost of storage devices, replication is too extravagant a solution for large-scale storage systems.

    Coding theory (and erasure coding specifically) offers an attractive alternative for introducing redundancy by making more efficient use of the storage space in providing fault tolerance. For this reason, large-scale distributed storage systems are increasingly turning towards erasure coding, with traditional Reed-Solomon (RS) codes being the popular choice. For instance, Facebook HDFS, Google Colossus, and several other systems employ RS codes. RS codes make optimal use of storage resources in the system for providing fault tolerance. This property makes RS codes appealing for large-scale, distributed storage systems where storage capacity is a critical resource.

    Erasure Coding Simply Explained

    As Chris Kranz, a Senior Systems Engineer at Hedvig,  explains in his excellent article and video, Data Protection: Erasure Coding or Replication, a helpful way to think of erasure coding in human terms is to compare it to the use of the NATO phonetic alphabet (Alfa, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliett, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu).

    When delivered in a high loss environment (say a loud gunfight or high-static radio), you can still reconstruct the message. If all you hear is “..HO, RO…, ..FA, SIER.., …FORM, ….MEO, ECH.” you should be able to reconstruct “Echo, Romeo, Alfa, Sierra, Uniform, Romeo, Echo” which of course spells ERASURE. This works because there is additional information encoded in the message that allows you to reconstruct the message despite losing parts of it.  This is more efficient than just shouting “erasure” repeatedly until you confirm receipt. This is how erasure coding works conceptually, except that math is involved in creating the reconstruction data, not language. Erasure coding works with any kind of data, including video stream data.

    Erasure Coding breaks data down into fragments, and coding is used to get redundancy data for re-creation of lost fragments. The original fragments plus the redundancy data are stored across many disks residing on several different servers. For example, an erasure coding method could be used that splits original pieces of data into 8 data chunks and creates 3 redundancy chunks, resulting in 11 chunks total. These 11 chunks are written to 11 different disks, with the result that any 3 disks could be lost, and the original 8 chunks could still be recovered. That would work for all the data stored across the 11 disks.

    Erasure Coding creates a high-performance storage system because data is written in parallel to many disks at once, much faster than data can be written to a smaller set of disks. Therefore, the more servers there are in the storage system, the better its performance is. When the 3 failed disks are replaced, data can be recovered quickly and with a very minor loss in storage performance, because the migration of data over to the three new disks is performed by 8 disks and several disk controllers that are spread across multiple servers. The storage system can continue to operate at, for example, 85% of normal speed, during the short time it takes to migrate data back to the 3 new disks. Various levels of erasure coding exist, each with its own tolerance of how many disks can be lost without any loss of data.

    With RAID, a single raid controller performs the rebuild work for a disk, which slows down the controller and takes a very long time. Erasure Coding is faster and thus data redundancy is reestablished much quicker. A more detailed explanation of Erasure Coding can be found in the article titled, Dummies Guide to Erasure Coding on Maheswaran (Mahesh) Sathiamoorthy’s blog. Don’t bother reading the Wikipedia article on Erasure Coding unless you need a very severe headache.

    Going Beyond Erasure Coding

    Erasure Coding technology is why, for example, if you have three nodes of Pivot3 video surveillance appliances, you can lose an entire node of appliances plus one disk drive on any node and still recover all the data very quickly – in hours, not weeks or days. Pivot3 also uses “virtual sparing”, which reserves space on each drive. This allows the system to proactively “fail” a hard drive that is showing early warning signs of failure, by quickly copying (not rebuilding) its data to virtual spare storage space. When the drive is replaced it becomes part of the virtual spare space, and no rebuilding was necessary. Their approach achieves six nines of reliability, which means 3.15 seconds of downtime per year. How is that possible with commercial-off-the-shelf (COTS) equipment? It’s possible because COTS equipment now includes hardware that is used to build cloud computing data centers.

    Cloud Computing Technology

    What is important to understand is that, unlike a RAID disk array, which can be built from a single RAID disk controller and a set of hard drives, high-performance Erasure Coding requires a lot of processing power, and so is achieved with a virtual storage system that is built using cloud computing technology and lots of virtualization. It takes serious computing power and high capacity virtual networking to build a storage system where 80% to 90% or more of the raw storage capacity is available for storing video data, in a manner that is more reliable and fault-tolerant than has previously been technically possible.

    For example, an array of Dell dual-CPU servers built with two 20-core CPUs. That’s 40 CPU cores per server. To each server add two or four high-power GPU cards for additional processing power, plus 96 GB of RAM. Then add some Non-Volatile Memory Express (NVMe) solid-state drives plugged into the server’s PCI Express bus, to use for read/write disk caching, and use VMware (for example) to virtualize all the computing and networking, to get several virtualized pools of hardware resources. Three software-defined 20Gbe networks are created – one for managing the system overall, one for the virtual machines’ application data, and one for a two- or three-petabyte (for example) video storage virtual SAN (Storage Area Network) built using 10 TB or 12 TB high-speed hard drives.

    You can see how different that is from a typical set of video servers. With custom Erasure Coding designed specifically for video data, you can have a system that can lose a full storage node plus several drives and still be able to handle all its video at an 85% or better performance level while the replacement drives (or a replacement node) are swapped in.

    Of course, a well-engineered cloud-based VMS can affordably provide much more compute, storage and network capacity than you could easily establish on site, if you need or want it. The point is that Erasure Coding is what makes high-performing ultra-reliable video storage possible.

    What This Means for Large-Scale Video Surveillance

    About three years ago the Dallas City Manager stated to the press, when asked why there was no video recording of the shooting of the Dallas police officer that had just taken place, that it was reasonable to expect that at any one time about 80% of the city’s 400 cameras will be recording, due to the state of technology and budgets. That meant at any one time 80 cameras would not be recording. That comment caused quite a firestorm, as one might expect. We need video recording systems that are 100% operational, and whose storage systems have better than four or five nines of reliability.

    That’s what high performance video management systems, built using cloud technology including Erasure Coding, can provide today at a reasonable cost. This is not the direction technology is going in – it’s where technology has already gone. The security industry just isn’t fully aware of it yet.

    Ray Bernard, PSP CHS-III, is the principal consultant for Ray Bernard Consulting Services (RBCS), a firm that provides security consulting services for public and private facilities (www.go-rbcs.com). In 2018 IFSEC Global listed Ray as #12 in the world’s top 30 Security Thought Leaders. He is the author of the Elsevier book Security Technology Convergence Insights available on Amazon. Mr. Bernard is a Subject Matter Expert Faculty of the Security Executive Council (SEC) and an active member of the ASIS International member councils for Physical Security and IT Security. Follow Ray on Twitter: @RayBernardRBCS.

    © 2018 RBCS. All Rights Reserved.