Welcome!

Open Source Cloud Authors: Pat Romanski, Elizabeth White, Liz McMillan, Rostyslav Demush, Charles Araujo

Related Topics: Open Source Cloud, Containers Expo Blog, @CloudExpo

Open Source Cloud: Article

There’s No Duping the Reign of Data Domain

Currently Unsurpassable, Data Domain's Deduplication Power Resides in its CPU Centric Architecture

EMC Session at Cloud Expo

Last year EMC’s somewhat controversial acquisition of Data Domain right under the noses of NetApp raised several eyebrows to say the least. Considering the reported amount of $2.1 billion and their already deduplication packed portfolio which consisted of the source based Avamar, the file-level deduplication/compression of its Celerra filer and their Quantum dedupe integrated VTLs, some heads were left scratching as to what actually was the big deal with the target based deduplication solution of Data Domain. Almost a year on and with Data Domain’s DD880 being adopted by an ever growing customer base, the heads have stopped scratching and are paying close attention as to what is probably the most significant advancement in backup technology of the last decade.

With deduplication currently being all the rage, with possibly only ‘Cloud Computing’ overshadowing it, the benefits of deduplication are becoming an exigency for backup and storage architects. With most backup software producing copious amounts of duplicate data stored in multiple locations, deduplication offers the ability to eliminate those redundancies and hence use less storage, less bandwidth for backups and hence shrink backup windows. With source based and file level based deduplication offerings, it is Data Domain’s target based solution i.e. the big black box that is clearly taking the lead and producing the big percentages in terms of data reduction. So what exactly is so amazing about the Data Domain solution, when upon initial glance at for example the DD880 model, all one can see is just a big black box? Even installing one of the Data Domain boxes hardly requires much brainpower apart from the assignment of an IP address and a bit of cabling. And as for the GUI, one could easily forget about it as the point of the ‘big black box’ is that you just leave it there to do its thing and sure enough it does its thing.

And while the big black box sits there in your data center the figures start to jump out at you where an average backup environment can see a reduction of up to 20 times. For example a typical environment with a first full backup of 1TB with only 250GB of physical data will immediately see a quadrupled reduction. If such an environment was to take weekly backups with a logical growth rate of 1.4TB per week but with only a physical growth of 58GB per week, the approximate reduction could go up to more than 20 times within four months:

Reduction =

First Full + (Cumulative Logical Growth x Number of weeks) / Physical Full + (Cumulative Physical Growth x Number of weeks)

e.g. After 25 weeks

Reduction = 1TB + (1.4TB x 25) / 0.250TB + (0.058TB x 25)

= 35TB / 1.7TB

= 21 times less data is backed up

So how does Data Domain come up with such impressive results? Upon closer inspection, despite being considered the ‘latest technology’, Data Domain’s target based deduplication solution has actually been around since 2003, so in other words these guys have been doing this for years. Now in 2010 with the DD880, to term their latest ‘cutting edge’ would be somewhat misleading when a more suitable term would be ‘consistently advancing’. Those consistent advancements have come from the magic of the big black box being based on its CPU-centric architecture and hence not reliant upon adding more disk drives. So whenever Intel unveils a new processor, Data Domain does likewise with its incorporation into their big black box. Consequently the new DD880’s stunning results are the result of its incorporation of a quad-socket quad-core processor system. With such CPU power the DD880 can easily handle aggregate throughput to up to 5.4 TB per hour and single-stream throughput of up to 1.2 TB per hour while supporting up to 71 TB of usable capacity, leaving its competitors in its wake. Having adopted such an architecture, Data Domain have pretty much guaranteed a future of advancing their inline deduplication architecture by taking advantage of every inevitable advance on Intel's CPUs.

Unlike the source based offerings, Data Domain’s Target-based solution is controlled by a storage system rather than a host and thus takes the files or volumes from the disk and simply dumps them onto to the disk-based backup target. The result is a more robust and sounder solution to a high change-rate environment or one with large databases where RPOs can be met a lot easier than with a source-based dedupe solution.

Another conundrum that Data Domain’s solution brings up is the future of tape based backups. The cheap RAID 6 protected 1 TB / 500 GB 7.2k rpm SATA HDD disks used by the DD880 alongside the amount of data reduced via its deduplication also brings into question the whole cost advantage of backing up to tape. If there’s less data to back up and hence fewer disks than tape required, what argument remains for avoiding the more efficient disk to disk back up procedure? An elimination of redundant data with a factor of 20:1 brings the economics of disk backup closer than ever to those of tape backups. Couple that with the extra costs of tape backups often failing, the tricky recovery procedures of tape based backups as well as backup windows which are increasingly scrutinized; this could well be the beginning of the end of the Tape Run guys having to do their regular rounds to the safe.

Furthermore with compatibility already with CIFS, NFS, NDMP and the Symantec OpenStorage, word is already out that development work is being done to integrate closer with EMC’s other juggernauts VMware and Networker. So while deduplication and its many forms saturate the market and bring in major cost savings to backup architectures across the globe, it is Data Domain’s CPU based, target based inline solution which has the most promising foundation and future and currently unsurpassable results. $2.1 billion? Sounds like a bargain.

More Stories By Archie Hendryx

SAN, NAS, Back Up / Recovery & Virtualisation Specialist.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
The Founder of NostaLab and a member of the Google Health Advisory Board, John is a unique combination of strategic thinker, marketer and entrepreneur. His career was built on the "science of advertising" combining strategy, creativity and marketing for industry-leading results. Combined with his ability to communicate complicated scientific concepts in a way that consumers and scientists alike can appreciate, John is a sought-after speaker for conferences on the forefront of healthcare science,...
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
DXWorldEXPO LLC announced today that Ed Featherston has been named the "Tech Chair" of "FinTechEXPO - New York Blockchain Event" of CloudEXPO's 10-Year Anniversary Event which will take place on November 12-13, 2018 in New York City. CloudEXPO | DXWorldEXPO New York will present keynotes, general sessions, and more than 20 blockchain sessions by leading FinTech experts.
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Ben Perlmutter, a Sales Engineer with IBM Cloudant, demonstrated techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, faster user e...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of ...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...