Open Source Cloud Authors: Yeshim Deniz, Liz McMillan, Pat Romanski, Elizabeth White, XebiaLabs Blog

News Feed Item

Cloudera Impala Delivers Superior Performance on Open Hadoop Data Over Proprietary Analytic DBMSs

Emerges as Fastest, Most Functional and Proven Way to Run SQL on Hadoop Data

PALO ALTO, CA -- (Marketwired) -- 01/13/14 -- Cloudera, the leader in Apache Hadoop™ based data management platforms, today released the results of performance benchmark testing for its open source interactive SQL query engine, Cloudera Impala. Impala queries across data in an open Hadoop columnar storage format (Parquet) ran on average 2x faster than identical queries on a commercial analytic database management system (DBMS) over its proprietary storage format.

Cloudera delivers an enterprise data hub -- a next-generation platform for secure, powerful, real-time processing and analysis of data at scale. An enterprise data hub must provide data governance and lineage services, support enterprise-grade backup and disaster recovery and offer a wide range of ways to work with the data that it manages. It must support the tools and interfaces on which existing applications and tools rely. Critical among those is real-time SQL access for analytics.

Impala Delivers

Launched in October 2012 and released for general availability in May 2013, Impala enables high speed, interactive SQL analysis of Hadoop data at petabyte scale. Today Impala has emerged as the fastest, most functional and proven way to run SQL on Hadoop data for open source users and enterprise customers alike. The platform has continued to evolve rapidly with deepening support for the ANSI-SQL standard, certified integrations to leading business intelligence tools, sophisticated workload management and consistently superior performance.

Impala deployments continue to proliferate in the enterprise: to date, the platform has been downloaded by more than 5,000 unique organizations globally, demonstrating its appeal and significance. Cloudera continues to work closely with its enterprise customers and the open source community to refine and advance Impala's enterprise features, like Apache Sentry (incubating), the fine-grained, role-based authorization module released this year. Further establishing Impala's leadership in the industry, Hadoop-based solutions from other vendors integrate the Cloudera-created SQL query engine into their own offerings in response to customer demand.

Analytic DBMS Performance on Open Data in the Enterprise Data Hub

Running a diverse set of analytic queries on identical hardware, Impala has successfully eclipsed the performance of a popular proprietary parallel DBMS. The same benchmarks also showed Impala has maintained or widened its performance advantage against the latest release of Apache Hive (0.12).

Furthermore, it has done so on data in an open Hadoop data format. With these results, customers are able to exceed their SQL performance experiences from proprietary databases but preserve the flexibility they enjoy with the Hadoop stack.

The Proof Is In the Data: Impala Shows BI-Class Speed for Mainstream Workloads

To evaluate Impala's query performance against a popular analytic database (referred to as "DBMS-Y"), Cloudera ran a series of 20 queries based on the industry-standard benchmark TPC-DS. The results showed that:

  • Impala ran consistently faster than DBMS-Y: across 20 queries, Impala ran on average 2X to DBMS-Y, outperforming DBMS-Y in 17 of the 20 queries. For some queries, Impala was over 4x faster.

Queries over open data beat those over proprietary data: Even though Impala queries were done on openHadoop data in the Parquet format, and DBMS-Y queries were done on data in its own proprietary format. Impala was still faster.

  • Impala scales linearly and predictably: In tests, Impala maintained identical response times with increased user concurrency and on larger data sets by simply adding new machines at the same rate as the concurrency and data growth.
  • Furthermore, Impala is still more than an order of magnitude faster than Hive: on identical hardware Impala queries ran an average of 24x faster than those run on Apache Hive 0.12 using ORCfile.

No Sleight of hand, no gimmicks

Cloudera is committed to leading the industry as a high integrity business that provides unbiased information to customers and users. Dozens of users have download Cloudera's 100% open source platform, run their own performance evaluations, and shared them publicly. Cloudera places no confidentiality clauses or other proprietary restrictions on the use of its distribution. In addition, Cloudera has made the queries, configuration, hardware specifications, and data available for use for the open source community to review and evaluate. Information can be found at http://www.cloudera.com/impalaishellafast/

"Interactive exploratory business intelligence is a mainstay workload of the Enterprise Data Hub," said Mike Olson, founder, chief strategy officer and chairman of the Board at Cloudera. "We are proud of how quickly Impala has evolved and the rate at which it is being adopted. With thousands of users now running Impala in production, its significance is indisputable. One year ago, when we released Impala to open source, we knew that it had the potential to eventually play on the same field as some very mature analytic DBMSs, but the results of these performance benchmark tests exceed our very high expectations. In the coming months, we will unveil new enhancements to the platform that will further advance its performance, ease of use and security, extending Impala's benefits for open source users and our enterprise customers."

Learn More About Cloudera Impala
For a more detailed account of the methodology and results from Cloudera's Impala performance benchmark testing against Hive and a proprietary DBMS, visit the Cloudera blog: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed

For more information about Cloudera Impala and how to download for free, visit: http://cloudera.com/content/cloudera/en/products/cdh/impala.html

About Cloudera
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data: The Enterprise Data Hub. Cloudera offers enterprises one place to store, process and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Founded in 2008, Cloudera was the first and is still today the leading provider and supporter of Hadoop for the enterprise. Cloudera also offers software for business critical data challenges including storage, access, management, analysis, security and search. With over 15,000 individuals trained, Cloudera is a leading educator of data professionals, offering the industry's broadest array of Hadoop training and certification programs. Cloudera works with over 800 hardware, software and services partners to meet customers' big data goals. Leading organizations in every industry run Cloudera in production, including finance, telecommunications, retail, internet, utilities, oil and gas, healthcare, biopharmaceuticals, networking and media, plus top public sector organizations globally. www.cloudera.com

Connect with Cloudera
Read our blog: http://www.cloudera.com/blog/
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera

Cloudera, Cloudera Platform for Big Data and CDH are trademarks or registered trademarks of Cloudera in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Add to Digg Bookmark with del.icio.us Add to Newsvine

More Stories By Marketwired .

Copyright © 2009 Marketwired. All rights reserved. All the news releases provided by Marketwired are copyrighted. Any forms of copying other than an individual user's personal reference without express written permission is prohibited. Further distribution of these materials is strictly forbidden, including but not limited to, posting, emailing, faxing, archiving in a public database, redistributing via a computer network or in a printed form.

@ThingsExpo Stories
Cloud based infrastructure deployment is becoming more and more appealing to customers, from Fortune 500 companies to SMEs due to its pay-as-you-go model. Enterprise storage vendors are able to reach out to these customers by integrating in cloud based deployments; this needs adaptability and interoperability of the products confirming to cloud standards such as OpenStack, CloudStack, or Azure. As compared to off the shelf commodity storage, enterprise storages by its reliability, high-availabil...
The security needs of IoT environments require a strong, proven approach to maintain security, trust and privacy in their ecosystem. Assurance and protection of device identity, secure data encryption and authentication are the key security challenges organizations are trying to address when integrating IoT devices. This holds true for IoT applications in a wide range of industries, for example, healthcare, consumer devices, and manufacturing. In his session at @ThingsExpo, Lancen LaChance, vic...
In the next forty months – just over three years – businesses will undergo extraordinary changes. The exponential growth of digitization and machine learning will see a step function change in how businesses create value, satisfy customers, and outperform their competition. In the next forty months companies will take the actions that will see them get to the next level of the game called Capitalism. Or they won’t – game over. The winners of today and tomorrow think differently, follow different...
The IoT industry is now at a crossroads, between the fast-paced innovation of technologies and the pending mass adoption by global enterprises. The complexity of combining rapidly evolving technologies and the need to establish practices for market acceleration pose a strong challenge to global enterprises as well as IoT vendors. In his session at @ThingsExpo, Clark Smith, senior product manager for Numerex, will discuss how Numerex, as an experienced, established IoT provider, has embraced a ...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in Embedded and IoT solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 7-9, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and ...
The Internet of Things (IoT), in all its myriad manifestations, has great potential. Much of that potential comes from the evolving data management and analytic (DMA) technologies and processes that allow us to gain insight from all of the IoT data that can be generated and gathered. This potential may never be met as those data sets are tied to specific industry verticals and single markets, with no clear way to use IoT data and sensor analytics to fulfill the hype being given the IoT today.
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
Donna Yasay, President of HomeGrid Forum, today discussed with a panel of technology peers how certification programs are at the forefront of interoperability, and the answer for vendors looking to keep up with today's growing industry for smart home innovation. "To ensure multi-vendor interoperability, accredited industry certification programs should be used for every product to provide credibility and quality assurance for retail and carrier based customers looking to add ever increasing num...
The Open Connectivity Foundation (OCF), sponsor of the IoTivity open source project, and AllSeen Alliance, which provides the AllJoyn® open source IoT framework, today announced that the two organizations’ boards have approved a merger under the OCF name and bylaws. This merger will advance interoperability between connected devices from both groups, enabling the full operating potential of IoT and representing a significant step towards a connected ecosystem.
Manufacturers are embracing the Industrial Internet the same way consumers are leveraging Fitbits – to improve overall health and wellness. Both can provide consistent measurement, visibility, and suggest performance improvements customized to help reach goals. Fitbit users can view real-time data and make adjustments to increase their activity. In his session at @ThingsExpo, Mark Bernardo Professional Services Leader, Americas, at GE Digital, discussed how leveraging the Industrial Internet a...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Penta Security is a leading vendor for data security solutions, including its encryption solution, D’Amo. By using FPE technology, D’Amo allows for the implementation of encryption technology to sensitive data fields without modification to schema in the database environment. With businesses having their data become increasingly more complicated in their mission-critical applications (such as ERP, CRM, HRM), continued ...
SYS-CON Events announced today that Cloudbric, a leading website security provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Cloudbric is an elite full service website protection solution specifically designed for IT novices, entrepreneurs, and small and medium businesses. First launched in 2015, Cloudbric is based on the enterprise level Web Application Firewall by Penta Security Sys...
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
Most people haven’t heard the word, “gamification,” even though they probably, and perhaps unwittingly, participate in it every day. Gamification is “the process of adding games or game-like elements to something (as a task) so as to encourage participation.” Further, gamification is about bringing game mechanics – rules, constructs, processes, and methods – into the real world in an effort to engage people. In his session at @ThingsExpo, Robert Endo, owner and engagement manager of Intrepid D...
WebRTC adoption has generated a wave of creative uses of communications and collaboration through websites, sales apps, customer care and business applications. As WebRTC has become more mainstream it has evolved to use cases beyond the original peer-to-peer case, which has led to a repeating requirement for interoperability with existing infrastructures. In his session at @ThingsExpo, Graham Holt, Executive Vice President of Daitan Group, will cover implementation examples that have enabled ea...
SYS-CON Events announced today that Enzu will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their online busine...
SYS-CON Events announced today that Roundee / LinearHub will exhibit at the WebRTC Summit at @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LinearHub provides Roundee Service, a smart platform for enterprise video conferencing with enhanced features such as automatic recording and transcription service. Slack users can integrate Roundee to their team via Slack’s App Directory, and '/roundee' command lets your video conference ...