Welcome!


Hadoop

Combine our natural proclivity to succumb to popular fallacies with the challenge of getting our wetware around just how big Big Data can be, and you have a recipe for disaster. But the good news is that there is hope. The best way to avoid an unseen trap in your path is to know it’s t...
Peter Schlampp, the Vice President of Products and Business Development at Platfora, explains what the Hadoop Big Data reservoir is and is not in this webinar that I watched today. Knowing what the HDR is and is not is key to pulling out business intelligence insights and analytics. Pl...
Big Data applications such as Hadoop and Hive are becoming more widely adopted and mainstream. There are increasing numbers of users who will select the cloud – whether private or public - as an efficient and scalable deployment vehicle for these large-scale distributed apps. Hadoop im...
In his session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], Intel's Chris Black will review the background of Apache Hadoop, its application, and methods to accelerate data system clusters with Intel SSD technology. The session will overview the genius of Hadoop and pro...
Analyzing Hadoop jobs and speeding them up is often a tedious and time consuming effort that requires experts. In his upcoming session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Michael Kopp will be showing how proven APM techniques can be used to speed up Hadoop jobs...
Following my initial introduction to Hadoop and overview of Hadoop components, I studied the Yahoo Hadoop tutorial, and have a deeper understanding of Hadoop. I would like to share my learning and help others understand Hadoop.
Enterprises can't close their doors just because integration tools won't cope with the volume of information that their systems produce. As each day goes by, their information will become larger and more complicated, and enterprises must constantly struggle to manage the integration o...
When talking with the less-technical people in your enterprise, which may include end users and many others on the leadership team, it always pays to have non technical expressions to describe new capabilities. Here are some thoughts on Platfora that may be of use in discussions like t...

I saw a conversation today on Twitter that asked why we don’t just embed proper security into Hadoop instead of suggesting the API gateway approach to Hadoop security that my colleague Blake proposed.  The same could be asked about any number … May. 16, 2013 01:30 PM EDT  Reads: 2,438

MapR Technologies, the Hadoop house, Wednesday released M7, a Big Data platform that’s supposed to remove the usual trade-offs involved in deploying a NoSQL database on Hadoop. The widgetry packages Hadoop with HBase, the open source NoSQL database modeled after Google’s BigTable...
Enterprises can't close their doors just because integration tools won't cope with the volume of information that their systems produce. As each day goes by, their information will become larger and more complicated and enterprises must constantly struggle to manage the integration of ...
“By making this source code easily accessible on GitHub and providing open APIs such as NFS and ODBC, MapR is ensuring that end users have available an open and flexible, enterprise-grade platform for Hadoop,” said Tomer Shiran, director of product management at MapR Technologies, Inc....
"When we started out, I had no idea what Hadoop would become," blogged Hadoop's founder Doug Cutting today on the occasion of Hadoop's seventh birthday, "so I proposed a name for it that didn’t have any connotation."

"The project has grown," Cutting continued, "giving that name m...

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase...
EMC and its Greenplum unit claim they have tamed some of the bears in Hadoop, the popular but difficult-to-work-with open source Big Data platform. They’ve created a new Apache Hadoop distribution called Pivotal HD that natively integrates Greenplum’s massively parallel processin...
Red Hat, which has staked its fortunes on hybrid computing, sees Big Data as a “killer app for the open hybrid cloud.” It sketched out the direction it’s gonna take with Big Data and the cloud the other day when it said it was gonna open source its Hadoop plug-in – which is based...
MapR Technologies, the Hadoop outfit, is launching European operations in support of its growing community of customers and partners there. Its new headquarters in London will provide MapR with a base for sales to accelerate the adoption of its high-performance enterprise-grade Ha...
Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regression trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package). It also introduced the possibility of applying this and the other big-da...
BI Research, Teradata Aster, and Hortonworks are teaming up to provide clear guidance on Big Data architecture and product integration to bring more value to businesses. Attend this discussion to learn more about: MapReduce for the data scientist: the Hadoop/Hive and RDBMS approaches ...
We have entered the “Age of Big Data” according to a recent New York Times article. This comes as no surprise to most organizations already struggling with the onslaught of data coming from an increasing number of sources and at an increasing rate. The 2011 IDC Digital Universe Study r...
The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealin...

Cleversafe provides dispersed storage solutions that give infinite scale and cost-effective data storage/protection/access. Apache Hadoop and the CDH4 distribution provides all the required software for implementing MapReduce and the other chores associated with analysis over massiv...

A friend of mine from my IBM days (an expert in Data Warehousing, BI, etc.) told me about the Hadoop conference he attended in San Jose few weeks back. When he attended the same conference two years ago in New … Jul. 9, 2012 06:30 AM EDT  Reads: 6,648
Interesting article at GigaOm: http://bit.ly/OINpfr I won’t repeat the main points – but basically it says that since Hadoop is disk/ETL/batch based it won’t fit for real time processing of frequently changing data. Author correctly…

CTOlabs.com, a subsidiary of the technology research, consulting and services firm Crucial Point LLC and a peer site of CTOvision.com, has just published a white paper providing context and use cases on Hadoop For Law Enforcement, an important mission-focused domain ripe for the app...

I spent some time last week with several vendors and users of Hadoop, the formless data repository that is the current favorite of many dot coms and the darling of the data nerds. It was instructive. Moms and Dads, tell … Co...
With BigDataExpo 2012 New York (www.BigDataExpo.net), co-located with 10th Cloud Expo, due to kick off next week for 4 days of high-energy networking and discussions, what better time to remind you in greater detail of the distinguished individuals in our incredible Speaker Faculty for...
The two primary commercial providers that signed on for the proprietary files systems – IBM and EMC (via partnership with MapR) – have retrenched. As we’ve noted previously, the measure of success of an open source stack is the degree to which the target remains intact. That either co...
Every technologist I know has been working to learn more about Hadoop. I bet you are already somewhat familiar with some of the neat use-cases of this technological framework. But let me ask, wouldn’t you like to be able to express what Hadoop is in a clear, succinct way?
Good news for Big Data users: Cloudera recently released the second and final beta for Cloudera’s Distribution Including Apache Hadoop version 4 (CDH4), meaning that the official CDH4 release is coming soon. If you aren’t already using CDH, Cloudera offers the leading open-source...
Cloudera, the leading provider of Apache Hadoop-based data management software, services and training, today announced that it has established a Japanese subsidiary, Cloudera KK, and an office in Japan. Cloudera's formal presence extends availability of its products and support offerin...
It is our goal at Monitis to make the lives of web developers and system administrators easy. We have reviewed the 5 leading hosted hadoop-based applications and given a short analysis of them in this post to help guide you in finding a solution that best suits your needs.
In 2011, Apache Hadoop received tremendous attention for helping organizations cost-effectively capitalize on their big data. Hadoop is now disrupting the business of analyzing data. In his session at the 10th International Cloud Expo, Eric Baldeschwieler, Co-Founder & CEO of Hortonw...
We have previously provided a Quickstart guide to standing up Rackspace cloud servers (and have one for Amazon servers as well). These are very low cost ways of building reliable, production ready capabilities for enterprise use (commercial and government).  
Like my colleague Alex Olesker, I too attended Cloudera Day 2012.  While there were many panels of interest, perhaps one of the most important was Amr Awadallah‘s talk about big data applications to business intelligence. Many CTOVision readers with backgrounds in the intelligenc...
“Zettaset has already established itself as an innovator in the Big Data enterprise space and a visionary in this high-growth market,” said Nick Efstratis, managing director of EPIC Ventures and Zettaset board member, as it was announced this week that industry veteran Jim Vogt (pictu...
As a former enterprise CTO and current technology watcher, I was struck at the incredible brilliance of yesterday’s announcement by Dell and Cloudera. In a move that will help enterprises of all sizes serve a very wide range of missions, those two organizations have announced a n...
Lucene has quite a bit of fame in enterprise circles. It is a free/open source search software library first written in Java (but now ported to many other languages). Although lacking in the discovery capabilities of more advance information retrieval tools, it is very reliable, easy t...
Distributed File Systems (DFS) are a new type of file systems which provides some extra features over normal file systems and are used for storing and sharing files across wide area network and provide easy programmatic access. File Systems like HDFS from Hadoop and many others falls i...
"Ultimately, we believe that advancement in cloud computing technology will be driven by open source initiatives where large communities of engineers can collaborate and develop new code for the new applications and demands posed by the cloud model," says Shelton Shugar, SVP Cloud Comp...