Leverage a Data-First Strategy for Your AWS Cloud Migration

By Tony Velcich, Oct 12, 2021

When migrating from Hadoop to AWS, data needs to be available immediately to realize the business and technological benefits from cloud computing. Here’s how using a data-first approach will de-risk your AWS data migration.

All the cost benefits of moving applications as part of an AWS cloud migration won’t matter if the data isn’t available to deliver business value. A retail executive who can’t access the performance dashboard on Monday morning to see how stores in their territory performed over the weekend will not be pleased with a big bang migration where the data is unavailable during migration — and the fallout will definitely hit the IT department. That’s why companies would benefit from taking a data-first approach to their AWS cloud migration initiatives.

In a recent virtual event, WANdisco Chief Technology Officer Paul Scott-Murphy outlined how a data-first approach can deliver business benefits faster while reducing risk of the migration itself.

Key takeaways from the session are below:

A data-first approach to migration makes it possible for data scientists to immediately start using cloud-scale analytics platforms in AWS. Data becomes the central element of migrating to the cloud. A data-first approach takes into account both the volume of data that may be sitting in on-premises systems and the fact that datasets change over time and then provides a way to migrate data so that it is immediately available in the cloud.

Migrate data early as part of your AWS cloud migration

Storage is the first piece of cloud migration. In many AWS cloud migrations, the foundation for data migration is the AWS storage capability itself, S3. Amazon S3 was the first cloud-scale service for cloud storage and the foundation for the existing data lake in Amazon. It can be used in place of on-premises datasets held in platforms like Hadoop and provides different types of storage classes. Amazon S3 is also undergoing continuous improvement and cost reductions, making it ideal for large-scale storage — even more so than on-premises platforms.

Migrate metadata as part of AWS data migration

Metadata is the next piece of a data-first migration strategy, and it’s essential to use the right tools so that the metadata is accessible when needed. The Glue Data Catalog for AWS works as a central metadata repository accessible from services provided by AWS and its partners. Using the Glue Data Catalog is essential for a cloud migration strategy from platforms like Hadoop.

Previously, companies would need to use technologies like Apache Hive to hold metadata. However, in AWS, the Glue Data Catalog stores metadata regarding data services, transformations, and targets for transformations. Unlike other services, Glue Data Catalog is fully managed and fully Hive compatible, enabling companies to open up access to metadata previously stored in Hive across a broader range of cloud services.

Analyze data using EMR

The third step is compute. Amazon EMR is one of the central services available for compute needs for analytic workloads in the cloud. This service is a cloud big data platform that provides functionality to enable technologies like Spark, Hive, and HBase. The advantages of using EMR include its elasticity, security, and flexibility, as well as its industry-leading low total cost of ownership, according to IDC. Using EMR opens up use cases for data sets in the cloud, including machine learning, ETL, clickstream analysis, and other services.

Often, WANDisco customers will leverage storage, metadata, and compute, as well as third-party services like Databricks and Snowflake. This lets them run analytics against large datasets that stretch beyond basic storage use cases. Taking a data-first approach to migration enables many of the analytic platforms available in AWS to function against previously locked up data on-premises, quickly and without business disruption.

The difference a data-first approach makes

A data-first migration means that companies use data as the central element for their migrations to AWS or the cloud. But to do this, they need to consider what happens with data in their on-premises environment. For example, data in Hadoop doesn’t remain stationary; it is constantly changing and constantly ingesting new data, which could be hundreds of terabytes or petabytes.

A data-first approach considers the large volume of data, how the data changes, and that the business may directly depend on the data being available at all times. Data migration cannot disrupt business operations, so there must be a way to migrate data immediately. This means that data is available in the cloud and that changes to data occurring on-premises are also available immediately in the cloud. This is live data, and to do this, companies need to implement a solution that can do this without interrupting the business. It needs to be introduced simply, not require application changes, and scale to the volume of data involved. Supporting AWS data migration and availability at any scale without data loss, without data inconsistencies, and without disrupting data operations is the definition of a data-first migration approach.

Watch the webcast

A data-first strategy means moving as much of your live data into the cloud as fast as possible to take advantage of cloud scale storage, analytics, and new capabilities.

Tony Velcich

Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.



Get notified of the latest WANdisco Blog posts and Newsletter.

Terms of Service and Privacy Policy. You also agree to receive other marketing communications from WANdisco and our subsidiaries. You can unsubscribe anytime.

06th - 07th October 2022 | TORONTO

Big Data + AI 2022 Toronto Speaking session and space

Our LiveData Story

Related Blog Posts


Tech & Trends

Calling All Telco Data Leaders: Dial Up Your IoT Strategy

IoT – and 5G – are changing the way that the telecommunications sector does business. By aggregating...

Jun 29, 2022

Read More

Tech & Trends

3 IoT and Edge Computing Use Cases Transforming the Auto Industry

IoT is changing the way that the automotive sector does business. Legacy companies are transforming...

Jun 06, 2022

Read More

Tech & Trends

How to Build a Modern Data Architecture

A modern data architecture, a data-first approach, and a strategy to move to the cloud fast are keys...

May 24, 2022

Read More

Free Cloud Data Migration Assessment

Get a complete analysis of your data migration plan, including best practices and guidance to accelerate the migration.

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.