Welcome!

Open Source Cloud Authors: Liz McMillan, Pat Romanski, Yeshim Deniz, Elizabeth White, Zakia Bouachraoui

Related Topics: Open Source Cloud

Open Source Cloud: Blog Feed Post

Background on Lucene, Nutch and Hadoop

Doug Cutting has earned a reputation as an innovative, community-supporting developer

Doug Cutting is the creator of Lucene, Nutch and Hadoop. Doug and these projects are increasingly being mentioned in enterprise environments. So with this post I’ll provide a bit more of context on each.

Doug Cutting has earned a reputation as an innovative, community-supporting developer.  He is on the board of directors of the Apache Software Foundation and is now its Chairman. He is the Architect at Cloudera, the firm famous for delivering an end to end platform built on Hadoop.

Lucene has quite a bit of fame in enterprise circles. It is a free/open source search software library first written in Java (but now ported to many other languages). Although lacking in the discovery capabilities of more advance information retrieval tools, it is very reliable, easy to code to, and has a growing developer community. Lucine is frequently the basic search capability found in large government IT projects. It provides full text indexing and search.

Nutch is a web crawling and HTML parsing capability designed to be highly scalable and feature rich. It can be run on one system or scale to clusters of 100′s of systems. Nutch has a feature enabling “bias” so crawling and fetching of “important” pages can be done first.

Hadoop is a software framework for data-intensive solutions. Most enterprise technologists usually first think of the word “database” when they hear Hadoop.  But it is really more than a filesystem. It provides a new way of storing and interacting with stored data and a software framework designed to support data-intensive distributed applications.  Hadoop was inspired by Google’s MapReduce and Google File System papers.  It was originally written to support the Nutch project, but is now its own top level Apache Software Foundation capability.  Apache provides the kernel of capabilities for Hadoop.  Cloudera provides a Hadoop distribution that provides a full framework of capabilities.

Here is more context on these capabilities from Doug himself:

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder of Crucial Point and publisher of CTOvision.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the tec...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...