Welcome!

Open Source Cloud Authors: Elizabeth White, Pat Romanski, Jayaram Krishnaswamy, Derek Weeks, Anders Wallgren

News Feed Item

Cloudera Impala Delivers Superior Performance on Open Hadoop Data Over Proprietary Analytic DBMSs

Emerges as Fastest, Most Functional and Proven Way to Run SQL on Hadoop Data

PALO ALTO, CA -- (Marketwired) -- 01/13/14 -- Cloudera, the leader in Apache Hadoop™ based data management platforms, today released the results of performance benchmark testing for its open source interactive SQL query engine, Cloudera Impala. Impala queries across data in an open Hadoop columnar storage format (Parquet) ran on average 2x faster than identical queries on a commercial analytic database management system (DBMS) over its proprietary storage format.

Cloudera delivers an enterprise data hub -- a next-generation platform for secure, powerful, real-time processing and analysis of data at scale. An enterprise data hub must provide data governance and lineage services, support enterprise-grade backup and disaster recovery and offer a wide range of ways to work with the data that it manages. It must support the tools and interfaces on which existing applications and tools rely. Critical among those is real-time SQL access for analytics.

Impala Delivers

Launched in October 2012 and released for general availability in May 2013, Impala enables high speed, interactive SQL analysis of Hadoop data at petabyte scale. Today Impala has emerged as the fastest, most functional and proven way to run SQL on Hadoop data for open source users and enterprise customers alike. The platform has continued to evolve rapidly with deepening support for the ANSI-SQL standard, certified integrations to leading business intelligence tools, sophisticated workload management and consistently superior performance.

Impala deployments continue to proliferate in the enterprise: to date, the platform has been downloaded by more than 5,000 unique organizations globally, demonstrating its appeal and significance. Cloudera continues to work closely with its enterprise customers and the open source community to refine and advance Impala's enterprise features, like Apache Sentry (incubating), the fine-grained, role-based authorization module released this year. Further establishing Impala's leadership in the industry, Hadoop-based solutions from other vendors integrate the Cloudera-created SQL query engine into their own offerings in response to customer demand.

Analytic DBMS Performance on Open Data in the Enterprise Data Hub

Running a diverse set of analytic queries on identical hardware, Impala has successfully eclipsed the performance of a popular proprietary parallel DBMS. The same benchmarks also showed Impala has maintained or widened its performance advantage against the latest release of Apache Hive (0.12).

Furthermore, it has done so on data in an open Hadoop data format. With these results, customers are able to exceed their SQL performance experiences from proprietary databases but preserve the flexibility they enjoy with the Hadoop stack.

The Proof Is In the Data: Impala Shows BI-Class Speed for Mainstream Workloads

To evaluate Impala's query performance against a popular analytic database (referred to as "DBMS-Y"), Cloudera ran a series of 20 queries based on the industry-standard benchmark TPC-DS. The results showed that:

  • Impala ran consistently faster than DBMS-Y: across 20 queries, Impala ran on average 2X to DBMS-Y, outperforming DBMS-Y in 17 of the 20 queries. For some queries, Impala was over 4x faster.

Queries over open data beat those over proprietary data: Even though Impala queries were done on openHadoop data in the Parquet format, and DBMS-Y queries were done on data in its own proprietary format. Impala was still faster.

  • Impala scales linearly and predictably: In tests, Impala maintained identical response times with increased user concurrency and on larger data sets by simply adding new machines at the same rate as the concurrency and data growth.
  • Furthermore, Impala is still more than an order of magnitude faster than Hive: on identical hardware Impala queries ran an average of 24x faster than those run on Apache Hive 0.12 using ORCfile.

No Sleight of hand, no gimmicks

Cloudera is committed to leading the industry as a high integrity business that provides unbiased information to customers and users. Dozens of users have download Cloudera's 100% open source platform, run their own performance evaluations, and shared them publicly. Cloudera places no confidentiality clauses or other proprietary restrictions on the use of its distribution. In addition, Cloudera has made the queries, configuration, hardware specifications, and data available for use for the open source community to review and evaluate. Information can be found at http://www.cloudera.com/impalaishellafast/

"Interactive exploratory business intelligence is a mainstay workload of the Enterprise Data Hub," said Mike Olson, founder, chief strategy officer and chairman of the Board at Cloudera. "We are proud of how quickly Impala has evolved and the rate at which it is being adopted. With thousands of users now running Impala in production, its significance is indisputable. One year ago, when we released Impala to open source, we knew that it had the potential to eventually play on the same field as some very mature analytic DBMSs, but the results of these performance benchmark tests exceed our very high expectations. In the coming months, we will unveil new enhancements to the platform that will further advance its performance, ease of use and security, extending Impala's benefits for open source users and our enterprise customers."

Learn More About Cloudera Impala
For a more detailed account of the methodology and results from Cloudera's Impala performance benchmark testing against Hive and a proprietary DBMS, visit the Cloudera blog: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed

For more information about Cloudera Impala and how to download for free, visit: http://cloudera.com/content/cloudera/en/products/cdh/impala.html

About Cloudera
Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data: The Enterprise Data Hub. Cloudera offers enterprises one place to store, process and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Founded in 2008, Cloudera was the first and is still today the leading provider and supporter of Hadoop for the enterprise. Cloudera also offers software for business critical data challenges including storage, access, management, analysis, security and search. With over 15,000 individuals trained, Cloudera is a leading educator of data professionals, offering the industry's broadest array of Hadoop training and certification programs. Cloudera works with over 800 hardware, software and services partners to meet customers' big data goals. Leading organizations in every industry run Cloudera in production, including finance, telecommunications, retail, internet, utilities, oil and gas, healthcare, biopharmaceuticals, networking and media, plus top public sector organizations globally. www.cloudera.com

Connect with Cloudera
Read our blog: http://www.cloudera.com/blog/
Follow us on Twitter: http://twitter.com/cloudera
Visit us on Facebook: http://www.facebook.com/cloudera

Cloudera, Cloudera Platform for Big Data and CDH are trademarks or registered trademarks of Cloudera in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Add to Digg Bookmark with del.icio.us Add to Newsvine

More Stories By Marketwired .

Copyright © 2009 Marketwired. All rights reserved. All the news releases provided by Marketwired are copyrighted. Any forms of copying other than an individual user's personal reference without express written permission is prohibited. Further distribution of these materials is strictly forbidden, including but not limited to, posting, emailing, faxing, archiving in a public database, redistributing via a computer network or in a printed form.

@ThingsExpo Stories
In his session at @ThingsExpo, Chris Klein, CEO and Co-founder of Rachio, will discuss next generation communities that are using IoT to create more sustainable, intelligent communities. One example is Sterling Ranch, a 10,000 home development that – with the help of Siemens – will integrate IoT technology into the community to provide residents with energy and water savings as well as intelligent security. Everything from stop lights to sprinkler systems to building infrastructures will run ef...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus inter...
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
Artificial Intelligence has the potential to massively disrupt IoT. In his session at 18th Cloud Expo, AJ Abdallat, CEO of Beyond AI, will discuss what the five main drivers are in Artificial Intelligence that could shape the future of the Internet of Things. AJ Abdallat is CEO of Beyond AI. He has over 20 years of management experience in the fields of artificial intelligence, sensors, instruments, devices and software for telecommunications, life sciences, environmental monitoring, process...
Increasing IoT connectivity is forcing enterprises to find elegant solutions to organize and visualize all incoming data from these connected devices with re-configurable dashboard widgets to effectively allow rapid decision-making for everything from immediate actions in tactical situations to strategic analysis and reporting. In his session at 18th Cloud Expo, Shikhir Singh, Senior Developer Relations Manager at Sencha, will discuss how to create HTML5 dashboards that interact with IoT devic...
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, will provide tips on how to be successful in large scale machine lear...
The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
SYS-CON Events announced today that Ericsson has been named “Gold Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. Ericsson is a world leader in the rapidly changing environment of communications technology – providing equipment, software and services to enable transformation through mobility. Some 40 percent of global mobile traffic runs through networks we have supplied. More than 1 billion subscribers around the world re...
You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.
There is an ever-growing explosion of new devices that are connected to the Internet using “cloud” solutions. This rapid growth is creating a massive new demand for efficient access to data. And it’s not just about connecting to that data anymore. This new demand is bringing new issues and challenges and it is important for companies to scale for the coming growth. And with that scaling comes the need for greater security, gathering and data analysis, storage, connectivity and, of course, the...
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry's single source for the cloud. Fusion's advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...
Digital payments using wearable devices such as smart watches, fitness trackers, and payment wristbands are an increasing area of focus for industry participants, and consumer acceptance from early trials and deployments has encouraged some of the biggest names in technology and banking to continue their push to drive growth in this nascent market. Wearable payment systems may utilize near field communication (NFC), radio frequency identification (RFID), or quick response (QR) codes and barcodes...
The increasing popularity of the Internet of Things necessitates that our physical and cognitive relationship with wearable technology will change rapidly in the near future. This advent means logging has become a thing of the past. Before, it was on us to track our own data, but now that data is automatically available. What does this mean for mHealth and the "connected" body? In her session at @ThingsExpo, Lisa Calkins, CEO and co-founder of Amadeus Consulting, will discuss the impact of wea...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
The IoT has the potential to create a renaissance of manufacturing in the US and elsewhere. In his session at 18th Cloud Expo, Florent Solt, CTO and chief architect of Netvibes, will discuss how the expected exponential increase in the amount of data that will be processed, transported, stored, and accessed means there will be a huge demand for smart technologies to deliver it. Florent Solt is the CTO and chief architect of Netvibes. Prior to joining Netvibes in 2007, he co-founded Rift Technol...
We’ve worked with dozens of early adopters across numerous industries and will debunk common misperceptions, which starts with understanding that many of the connected products we’ll use over the next 5 years are already products, they’re just not yet connected. With an IoT product, time-in-market provides much more essential feedback than ever before. Innovation comes from what you do with the data that the connected product provides in order to enhance the customer experience and optimize busi...
SYS-CON Events announced today that Stratoscale, the software company developing the next generation data center operating system, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Stratoscale is revolutionizing the data center with a zero-to-cloud-in-minutes solution. With Stratoscale’s hardware-agnostic, Software Defined Data Center (SDDC) solution to store everything, run anything and scale everywhere...
Angular 2 is a complete re-write of the popular framework AngularJS. Programming in Angular 2 is greatly simplified – now it's a component-based well-performing framework. This immersive one-day workshop at 18th Cloud Expo, led by Yakov Fain, a Java Champion and a co-founder of the IT consultancy Farata Systems and the product company SuranceBay, will provide you with everything you wanted to know about Angular 2.
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...