NEWS Coverage

The Future of Hadoop in a Cloud-Based World

April 29 2020

Hadoop once presented the promise of economical storage at massive scale, and streamlined processing of petabytes of data. As WANdisco CEO David Richards explains, though Hadoop took a big hit last year, it will stay with us for a while longer.

We’ve seen tectonic shifts in the big data industry this past year - with some $18 billion worth of acquisitions in the data and analytics space including Salesforce acquiring Tableau, Google acquiring Looker, and CommVault acquiring Hedvig.

This wave of consolidation unquestionably signals a fundamental change in the outlook for Hadoop. Yet even given the recent roller-coaster ride of Cloudera, MapR, and other Hadoop players – it’s too early to eulogize the platform. While Hadoop’s once superstar status is certainly diminished, its existence is not in question.

What is Hadoop?

Hadoop is a Java-based open source framework managed by the Apache Software Foundation, which was designed to store and process massive datasets over clusters of commodity hardware and leveraging simple programming models. Built to scale from individual servers to thousands of servers, Hadoop relies on software rather than hardware for high-availability – meaning the system itself detects and handles failures in the application layer. Hadoop is composed of two primary components – the Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN).

HDFS is the main Hadoop data storage system, which employs a NameNode/DataNode architecture to deliver high-performance access to data, in a distributed file system that sits on highly scalable Hadoop clusters. YARN, which was initially named ‘MapReduce 2’ (as the next generation of the wildly-popular ‘MapReduce’), helps schedule jobs and manage resources for all cluster applications. It is also widely used by Hadoop developers to create applications that can work with ultra-large datasets.

Read more

FOLLOW

SUBSCRIBE

Get notified of the latest WANdisco Blog posts and Newsletter.

Terms of Service and Privacy Policy. You also agree to receive other marketing communications from WANdisco and our subsidiaries. You can unsubscribe anytime.

27th - 30th June 2022 | SAN FRANCISCO

Data + AI Summit 2022 Speaking session and space

06th - 07th October 2022 | TORONTO

Big Data + AI 2022 Toronto Speaking session and space