Click here to close now.

Welcome!

Open Source Cloud Authors: Roger Strukhoff, Liz McMillan, Lacey Thoms, Ian Khan, Pat Romanski

Blog Feed Post

Cloudera and Cleversafe: A Strategic Combination For Enterprise IT

By

Cloudera and Cleversafe are totally different companies addressing different challenges. But the two firms have quite a bit in common. Here are key commonalities I’ve observed:

  • Both invest in real engineering and deliver enterprise-grade/quality capabilities
  • Both are proven to work at scale (including very large scale when required)
  • Both are led by CEOs that are highly regarded by their peers and the community, and both CEOs are very likeable people (I’ve met and worked with both Mike Olson and Chris Gladwin).
  • Both have used the services of my firm, Crucial Point (and that is most appreciated by me, by the way!).
  • Both are in the In-Q-Tel portfolio and are known to the national security community because of that.
  • Both firms partner with Carahsoft (which is, by the way, another strategic partner of Crucial Point’s).
  • Both are key thought leaders in the domain of Big Data, with Cloudera being known for its open source distribution of Apache Hadoop (CDH4) and their management capabilities over CDH, and Cleversafe being known for their fielding of modern object storage with the lowest cost/TB of any system on the market plus agile access and impressive I/O.
That said, these two firms really address different areas of enterprise data needs, and have built different capabilities that can be used by enterprises to address separate aspects of Big Data challenges.
Which is part of the reason I was excited to learn of cooperation between these two firms. When firms addressing different parts of a hard challenge collaborate it can mean great things for enterprise missions.
Here are a few thoughts on the nature of a well engineered solution that came from their work together:
  • In July 2012, Cleversafe announced that the are now working with Cloudera’s Distribution including Apache Hadoop (CDH) for new capabilities that enable the benefits of Cleversafe’s data storage and security with the power of MapReduce.  With this well engineered combination, the data for an enterprise is not stored in HDFS.  The benefits of HDFS are already provided by other Cleversafe functionality, so there is already fault tolerance and speed, for example. But even greater benefits are provided through this well engineered solution, including the elimination of single points of failure without the need for HDFS’s complete/multiple replication.
  • So basically you can store data using Cleversafe technology and get all the benefits there, and can run MapReduce jobs over the data leveraging Hadoop without using HDFS.
  • This well engineered solution enables data to be stored in conventional format on nodes where it is expected to be used for computation and enables MapReduce operation. This comes with the many other benefits of Cleversafe, including the ability to protect data without the overhead of massive network traffic and costly backup storage. It also removes challenges with Namenode issues since a Cleversafe cluster’s accesser nodes federate and cover for each other.
  • The bottom line result of this Cleversafe leveraging of Cloudera’s CDH:  Incredible cost benefits, fantastic disaster recovery/continuity of operations features, fast access to data from multiple locations, and an ability to run MapReduce jobs and leverage Hadoop-centric applications without using HDFS.
I liked the context provided by Andrew Brust at zdnet.com on this topic. He writes that:

Cleversafe swaps out HDFS
Assuming it works as advertised, Cleversafe’s company name is a fair reflection of its Hadoop architecture.  While other HDFS alternatives exist for Hadoop (for example, MapR‘s Hadoop distro, which can mount HDFS-compatible NFS volumes), Cleversafe’s Slicestor appliance nodes retain HDFS’ distributed nature and maintain fault tolerance too.  Cleversafe does this with “information dispersal” slices: spreading the data around different nodes in the cluster, employing Erasure Coding – a scheme that allows reconstruction of data from a subset of storage nodes, and eliminates single points of failure without the overhead of HDFS’ complete replication.

Meanwhile, the data is also stored in conventional format on the nodes where it is expected to be used for computation.  The conventional storage assures fast MapReduce operations, and the striped storage assures fault tolerance, without the need (and network traffic and management overhead) to keep multiple full copies of the data.

Namenode issues disappear as well, since a Cleversafe cluster’s accesser nodes federate and cover for each other, and the meta data is split up along with the data itself.  Although various high availability namenode technologies are appearing in the major Hadoop distributions now, they nonetheless still use a single central namenode at any given time.  Keeping a warm spare around is not the same thing as having meta data/directory services responsibilities shared among a collection of active nodes.

Although Cleversafe clusters are appliance-based, the appliances nonetheless use commodity processors and  storage.  The added value comes from tuning and optimization, and the unique storage software subsystem.  Cleversafe storage runs about $500 per Terabyte, and can be less depending on total storage size.  On the MapReduce side, Cleversafe uses Cloudera’s Distribution Including Apache Hadoop (CDH).

For more information see this July 2012 press release from Cleversafe:

Cleversafe First to Deliver Breakthrough Capabilities for Combined Storage and Massive Computation

First System to Support Storage and Analysis of Datasets at Previously Unattainable Scale with Unparalleled Reliability and Efficiency

Chicago, July 10, 2012 – Cleversafe Inc., the solution for limitless data storage, today announced plans to build the first Dispersed Compute Storage solution by combining the power of Hadoop MapReduce with Cleversafe’s highly scalable Object-based Dispersed Storage System. This solution will significantly alter the Big Data landscape by decreasing infrastructure costs for separate servers dedicated to analytical processes, reducing required storage capacity, and simultaneously improving data integrity. In addition, the company’s solution will reduce network bottlenecks by bringing together computation and storage at any scale, petabytes to exabytes and beyond.

Traditional storage systems are not designed for large-scale distributed computation and data analysis. Present implementations treat data storage and analysis of that data separately, transferring data from Storage Area Networks (SANs) or Network Attached Storage (NASs) across the network to perform the computations used to gather insight. In this manner the network quickly becomes the bottleneck, making multi-site computation over the WAN particularly challenging. Cleversafe solves this problem by combining Hadoop MapReduce alongside its Dispersed Storage Network (dsNet) system on the same platform and replacing the Hadoop Distributed File System (HDFS) which relies on 3 copies to protect data with Information Dispersal Algorithms thereby significantly improving reliability and allowing analytics at a scale previously unattainable through traditional HDFS configurations.

“For any company, the movement, management and storage of massive data stores for analytical purposes is already unmanageable,” said Chris Gladwin, CEO and President of Cleversafe. “Many companies have had to invest significant resources in both CAPEX and OPEX to manage the challenge of Big Data and to try and capitalize on the opportunity to gather insights from that data,” said Gladwin. “The key to reducing both cost and complexity is to combine computation with dispersed storage,” said Gladwin. “Cleversafe’s solution will provide infinitely scalable, reliable, and cost effective storage for data to support massive computation while enhancing the analysis workflow.”

Hadoop MapReduce, which is already being used broadly throughout the industry, represents only a partial solution to this problem. While it lends itself naturally to enabling computations where the data exists rather than transferring data to computation nodes, it has inherent scalability and reliability limitations. Current HDFS deployments utilize a single server for all metadata operations and 3 copies of the data for protection. Failure of the single metadata node could render stored data inaccessible or result in a permanent loss of data. Maintaining 3 copies of data at massive scale for protection leads to skyrocketing overhead and management costs.

Cleversafe’s dsNet system protects both data and metadata equally and is inherently more reliable. By applying the company’s unique Information Dispersal technology to slice and disperse data, single points of failure are eliminated. As data is distributed evenly across all Slicestor nodes metadata can scale linearly and infinitely as new nodes are added, thus reducing any scalability bottlenecks and increasing performance. Cleversafe’s unique approach delivers the powerful combination of analytics and storage in a geographically distributed single system allowing organizations to efficiently scale their Big Data environments to hundreds of petabytes and even exabytes today.

“There isn’t an industry today that’s untouched by Big Data or a company that wouldn’t benefit from the intrinsic value of that data if they could collect, organize, store and analyze it in a cost-effective manner,” said John Webster, Senior Partner at Evaluator Group. “Cleversafe’s approach to combining dispersed storage and Hadoop for analytics is a groundbreaking step for the industry and for any company to effectively bridge storage and large-scale computation,” said Webster.

No market segment has a more critical need to harness Big Data than the Government sector. Lockheed Martin is partnering with Cleversafe to develop a federal version of the Cleversafe Dispersed Compute Storage solution designed for the unique needs of federal government agencies.

“By combining the power of Hadoop analytics with Cleversafe’s Object-based Dispersed Storage solution, government entities will be able to significantly reduce their total cost of infrastructure as the amount of their mission critical data grows,” said Tom Gordon, CTO & VP of Engineering of Lockheed Martin’s Information Systems and Global Solutions-National. “The Federal community has been out in front of Big Data, well ahead of many other market segments, and needs technology solutions today that are well suited for Exabyte scale storage as well as massive computation,” said Gordon. “Taken Cleversafe’s approach with Hadoop across commodity hardware, these features deliver a new approach to bring the true potential of Big Data analytics into reach.”

Cleversafe’s object-based storage solution is 100 million times more reliable than traditional RAID-based systems and it doesn’t rely on replication to protect information. Its information dispersal capabilities reduce storage costs up to 90 percent while meeting compliance requirements and ensuring protection against data loss, whether it’s latent hardware errors, data corruption or malicious threats. With the combination of limitless scale, highly reliable storage and efficient analytics in the same platform, Cleversafe is solving the most challenging Big Data problems for customers in a very efficient manner.

Tweet This: @Cleversafe to build first storage-based compute solution based on its dsNet solution and Hadoop MapReduce.

About Cleversafe Inc.

Cleversafe has created a breakthrough technology that solves petabyte and beyond big data storage problems. This solution drives up to 90 percent of the storage cost out of the business while enabling secure and reliable global access and collaboration. The world’s largest data repositories rely on Cleversafe. To learn more about Cleversafe and its solutions, please visit www.cleversafe.com, call 312-423-6640 or email us at [email protected].

 

 

 

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.

@ThingsExpo Stories
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, discussed single-value, geo-spatial, and log time series data. By focusing on enterprise applications and the data center, he will use OpenTSDB as an example t...
We’re entering a new era of computing technology that many are calling the Internet of Things (IoT). Machine to machine, machine to infrastructure, machine to environment, the Internet of Everything, the Internet of Intelligent Things, intelligent systems – call it what you want, but it’s happening, and its potential is huge. IoT is comprised of smart machines interacting and communicating with other machines, objects, environments and infrastructures. As a result, huge volumes of data are being generated, and that data is being processed into useful actions that can “command and control” thi...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using the URL as a basic building block, we open this up and get the same resilience that the web enjoys.
We are reaching the end of the beginning with WebRTC, and real systems using this technology have begun to appear. One challenge that faces every WebRTC deployment (in some form or another) is identity management. For example, if you have an existing service – possibly built on a variety of different PaaS/SaaS offerings – and you want to add real-time communications you are faced with a challenge relating to user management, authentication, authorization, and validation. Service providers will want to use their existing identities, but these will have credentials already that are (hopefully) i...
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
Thanks to widespread Internet adoption and more than 10 billion connected devices around the world, companies became more excited than ever about the Internet of Things in 2014. Add in the hype around Google Glass and the Nest Thermostat, and nearly every business, including those from traditionally low-tech industries, wanted in. But despite the buzz, some very real business questions emerged – mainly, not if a device can be connected, or even when, but why? Why does connecting to the cloud create greater value for the user? Why do connected features improve the overall experience? And why do...
SYS-CON Events announced today that O'Reilly Media has been named “Media Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying "faint signals" from the alpha geeks who are creating the future. An active participa...
SYS-CON Events announced today that BMC will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage. BMC has worked with thousands of leading companies to create and deliver powerful IT management services. From mainframe to cloud to mobile, BMC pairs high-speed digital innovation with robust IT industrialization – allowing customers to provide amazing user experiences with optimized IT per...
Imagine a world where targeting, attribution, and analytics are just as intrinsic to the physical world as they currently are to display advertising. Advances in technologies and changes in consumer behavior have opened the door to a whole new category of personalized marketing experience based on direct interactions with products. The products themselves now have a voice. What will they say? Who will control it? And what does it take for brands to win in this new world? In his session at @ThingsExpo, Zack Bennett, Vice President of Customer Success at EVRYTHNG, will answer these questions a...
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
An entirely new security model is needed for the Internet of Things, or is it? Can we save some old and tested controls for this new and different environment? In his session at @ThingsExpo, New York's at the Javits Center, Davi Ottenheimer, EMC Senior Director of Trust, reviewed hands-on lessons with IoT devices and reveal a new risk balance you might not expect. Davi Ottenheimer, EMC Senior Director of Trust, has more than nineteen years' experience managing global security operations and assessments, including a decade of leading incident response and digital forensics. He is co-author of t...
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session at @ThingsExpo, Ryan Bagnulo, Solution Architect / Software Engineer at SOA Software, focused on desi...
The multi-trillion economic opportunity around the "Internet of Things" (IoT) is emerging as the hottest topic for investors in 2015. As we connect the physical world with information technology, data from actions, processes and the environment can increase sales, improve efficiencies, automate daily activities and minimize risk. In his session at @ThingsExpo, Ed Maguire, Senior Analyst at CLSA Americas, will describe what is new and different about IoT, explore financial, technological and real-world impact across consumer and business use cases. Why now? Significant corporate and venture...
While great strides have been made relative to the video aspects of remote collaboration, audio technology has basically stagnated. Typically all audio is mixed to a single monaural stream and emanates from a single point, such as a speakerphone or a speaker associated with a video monitor. This leads to confusion and lack of understanding among participants especially regarding who is actually speaking. Spatial teleconferencing introduces the concept of acoustic spatial separation between conference participants in three dimensional space. This has been shown to significantly improve comprehe...
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
There will be 150 billion connected devices by 2020. New digital businesses have already disrupted value chains across every industry. APIs are at the center of the digital business. You need to understand what assets you have that can be exposed digitally, what their digital value chain is, and how to create an effective business model around that value chain to compete in this economy. No enterprise can be complacent and not engage in the digital economy. Learn how to be the disruptor and not the disruptee.
The enterprise market will drive IoT device adoption over the next five years. In his session at @ThingsExpo, John Greenough, an analyst at BI Intelligence, division of Business Insider, will analyze how companies will adopt IoT products and the associated cost of adopting those products. John Greenough is the lead analyst covering the Internet of Things for BI Intelligence- Business Insider’s paid research service. Numerous IoT companies have cited his analysis of the IoT. Prior to joining BI Intelligence, he worked analyzing bank technology for Corporate Insight and The Clearing House Pay...
The Domain Name Service (DNS) is one of the most important components in networking infrastructure, enabling users and services to access applications by translating URLs (names) into IP addresses (numbers). Because every icon and URL and all embedded content on a website requires a DNS lookup loading complex sites necessitates hundreds of DNS queries. In addition, as more internet-enabled ‘Things' get connected, people will rely on DNS to name and find their fridges, toasters and toilets. According to a recent IDG Research Services Survey this rate of traffic will only grow. What's driving t...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...