Welcome!

Open Source Cloud Authors: Elizabeth White, Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Open Source Cloud, Apache, @DXWorldExpo

@CloudExpo: Blog Post

Hadoop – 100x Faster By @GridGain | @CloudExpo [#BigData]

How we did it...

If you know anything about Hadoop architecture - the task seemed daunting to us and it proved to be one of the most challenging engineering feat that we have accomplished so far.

After almost 24 months of development, tens of thousands of lines of Java, Scala and C++ code, multiple design iterations, several releases and dozens of benchmarks later we have the product that can deliver real-time performance to Hadoop with only minimal integration and no ETL required. Backed-up by customer deployments that prove our performance claims and validate our architecture.

Here's how we did it.

The Idea - In-Memory Hadoop Accelerator
Hadoop is based on two key technologies: HDFS for storing data, and MapReduce for processing that data in parallel. Everything else in Hadoop itself and the entire ecosystem coalesce around these two technologies.

Both - HDFS and MapReduce - were not necessarily designed with real-time performance in mind and in order to deliver real-time processing without moving data out of Hadoop into an alternative technology, we had to improve the performance of each of these sub-systems directly.

in_memory_hadoop2_white

We decided to develop a high performance in-memory file system that provides 100% compatibility with HDFS and an optimized MapReduce implementation that would take advantage of this real-time file system. By doing so, we could offer all of the advantages of our in-memory platform while minimizing the disruption of our customers' existing Hadoop investments.

There are many projects and products that aim to improve Hadoop performance. Projects like HDFS2, Apache Tez, Cloudera Impala, HortonWorks Stinger, ScaleOut hServer and Apache Spark to name but a few, all aim to solve Hadoop performance issues in various ways.

From a technology stand point GridGain's In-Memory Hadoop Accelerator has some similarity to the architecture of Spark (optimized MapReduce), ScaleOut and HDFS2 (in-memory caching without ETL) and some features of Apache Tez (in-process execution), however, GridGain's In-Memory Accelerator is the only product for Hadoop available today that combines the both the high performance HDFS-compatible file system and optimized in-memory MapReduce along with many other features in one fully integrated product.

In-Memory File System
First, we implemented GridGain's In-Memory File System (GGFS) to accelerate I/O in the Hadoop stack. The original idea was that GGFS alone will be enough to gain significant performance increase. However, while we saw significant performance gains using GGFS, when working with our customers we quickly found that there were some not so obvious performance limitations to the way in which Hadoop performs MapReduce. It quickly became clear to us that GGFS alone won't be enough but it was a critical piece that we needed to build first.

Note that you shouldn't confuse GGFS with much slower alternatives like RAM disk. GGFS is based on our Memory-First architecture and addresses more than just the seek time of the "device".

From the get go we designed GGFS to support both Hadoop v1 and YARN Hadoop v2. Further, we designed GGFS to work in two modes:

  • Primary (standalone), and
  • Secondary (caching HDFS).

In primary standalone mode GGFS acts as a bona-fide Hadoop file system that is PnP compatible with the standard HDFS interface. Our customers use it to deploy a high-performance in-memory Hadoop cluster and use it as any other Hadoop file system - albeit one that trades capacity for maximum performance.

One of the great added benefits of the primary mode is that it does away with NamedNode in the Hadoop deployment. Unlike a standard Hadoop deployment that requires shared storage for primary and secondary NameNodes which is usually implemented with a complex NFS setup mounted on each NameNode machine, GGFS seamlessly utilizes GridGain's In-Memory Database under the hood to provide completely automatic scaling and failover without any need for additional shared storage or risky Single Point Of Failure (SPOF) architectures.

Furthermore, unlike Hadoop's master-slave design for NamedNodes that prevents it from linear runtime scaling when adding new nodes, GGFS is built on a highly scalable, natively distributed partitioned data store that provides linear scalability and auto-discovery of new nodes. Removing NamedNode form the picture and all its chattiness enabled dramatically better performance for IO operations.

GGFS primary mode provides maximum performance for IO operations but will require moving data from disk-based HDFS to in-memory based GGFS (i.e. from one file system to another). While data movement may be appropriate for some use cases, we have a second mode, in which absolutely no ETL is required.

In the second mode, GGFS works as an intelligent secondary in-memory distributed cache over the primary disk-based HDFS file system. In this mode GGFS supports bothsynchronous and asynchronous read-through and write-through to and from HDFS providing either strong consistency or better performance in exchange for relaxed consistency with absolute transparency to the user and applications running on top of it. In this mode users can manually select which set of files and/or directories should be stored in GGFS and what mode - synchronous or asynchronous - should be used for each one of them for read-through and write-through to and from HDFS.

Another interesting feature of GGFS is its smart usage of block-level or file-level caching and eviction design. When working in primary mode GGFS utilizes file level caching to ensure corruption free storage (the file is either fully in GGFS or not at all). When in secondary mode, GridGain will automatically switch to block-level caching and eviction. What we discovered when working with our customers on real-world Hadoop payloads is that files on HDFS are often accessed not uniformly, i.e. they have significant "locality" in how portions of the file is being accessed. Put another way, certain blocks of a file are accessed more frequently than others. That observation led to our block-level caching implementation for the secondary mode that enables dramatically better memory utilization since GGFS can store only the most frequently used file blocks in memory - and not entire files which can easily measure in 100GBs in Hadoop.

No good caching can work effectively without equally sophisticated eviction management to make sure that memory is optimally utilized - and we've built a very neat one too. Apart from obvious eviction features you can configure certain files to never be evicted preserving them in memory in all cases for maximum performance.

To ensure seamless and continuous performance during MapReduce file scanning, we've implemented smart data prefetching via streaming data that is expected to be read in the nearest future to the MapReduce task ahead of time. By doing so, GGFS ensures that whenever a MapReduce task finishes reading a file block, the next file block is already available in memory. A significant performance boost was achieved here due to ourproprietary Inter-Process Communication (IPC) implementation which allows GGFS to achieve throughput of up to 30Gbit/s between two processes.

The table below shows GGFS vs. HDFS (on Flash-based SSDs) benchmark results for raw IO operations:

BenchmarkGGFS, ms.HDFS, ms.Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

As you can see from these results the IO performance difference is quite significant. However, HDFS performance is only part of the total Hadoop overhead. Another part is MapReduce overhead and that's what we address with In-Memory MapReduce.

In-Memory MapReduce
Once we had our high performance in-memory file system built and tested, we turned our attention to a MapReduce implementation that would take advantage of in-memory technology.

Hadoop's MapReduce design is one of the weakest points in Hadoop. It's basically a inefficiently designed system when it comes to distributed processing. GridGain In-Memory MapReduce implementation relies heavily on 7 years of experience developing our widely deployed In-Memory HPC product. GridGain's In-Memory MapReduce is designed on record-based approach vs. key-value approach of traditional MapReduce, and it enables much more streamlined parallel execution path on data stored in in-memory file system.

Furthermore, In-Memory MapReduce eliminates the standard overhead associated with the typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning. All in all - GridGain's In-Memory MapReduce is a highly optimized HPC-based implementation of the MapReduce concept enabling true low-latency data processing of data stored in GGFS.

The diagram below demonstrates the difference between a standard Hadoop MapReduce execution path and GridGain's In-Memory MapReduce execution path:

gg_hadoop_mapred_800

As seen in this diagram our MapReduce implementation supports direct execution path from client to data node. Moreover, all execution in GridGain happens in-process with deployment handled automatically and transparently by GridGain.

In-Memory MapReduce also provides integration capability for MapReduce code written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code with our In-Memory Accelerator for Hadoop to gain significant performance boost.

So finally - now that we can remove the task and job tracker polling, out of process execution, and the often unnecessary shuffling and sorting from MapReduce and couple it with high-performance in-memory file system we started seeing anywhere between 10x and 100x performance increases on typical MapReduce payloads in our tests.

Below are the results for one of the internal tests that utilizes both In-Memory File System and In-Memory MapReduce. This test was specifically designed to show maximum GridGain's Accelerator performance vs. stock Hadoop distribution for heavy I/O MapReduce jobs:

NodesHadoop, ms.Hadoop + GridGain Accelerator, ms.Boost, %
5 298,000 11,622 2,564%
10 201,350 5,537 3,636%
15 158,997 2,385 6,667%
20 122,008 1,647 7,407%
30 97,833 1,174 8,333%
40 82,771 780 10,612%

hadoop_chart

Tests were performed on a cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution and GridGain 5.2 release.

Management and Monitoring
No serious distributed system can be used without comprehensive DevOps support and In-Memory Accelerator for Hadoop comes with a comprehensive unified GUI-based management and monitoring tool called GridGain Visor. Over the last 12 months we've added significant support in Visor for Hadoop Accelerator.

Visor provides deep DevOps capabilities including an operations & telemetry dashboard, database and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.

visor_fm2

visor_ggfs

As part of GridGain Visor, In-Memory Accelerator For Hadoop also comes with a GUI-based file system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots.

GGFS profiler tracks speed and throughput of reads, writes, various directory operations, for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest. Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.

visor_profiler

Conclusion
After almost 2 years of development we have a well rounded product that can help you accelerate Hadoop MapReduce up to 100x times with minimal integration and effort. It's based on our innovative high-performance in-memory file system and in-memory MapReduce implementation coupled with one of the best management and monitoring tools.

If you want to be able to say words "milliseconds" and "Hadoop" in one sentence - you need to take a serious look at GridGain's In-Memory Hadoop Accelerator.

hadoop_acc_logo

More Stories By Nikita Ivanov

Nikita Ivanov is founder and CEO of GridGain Systems, started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies – the top Java in-memory computing platform starting every 10 seconds around the world today.

Nikita has over 20 years of experience in software application development, building HPC and middleware platforms, contributing to the efforts of other startups and notable companies including Adaptec, Visa and BEA Systems. Nikita was one of the pioneers in using Java technology for server side middleware development while working for one of Europe’s largest system integrators in 1996.

He is an active member of Java middleware community, contributor to the Java specification, and holds a Master’s degree in Electro Mechanics from Baltic State Technical University, Saint Petersburg, Russia.

IoT & Smart Cities Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...