Leverage a Data-First Strategy for Your AWS Cloud Migration

By Tony Velcich, Oct 12, 2021

When migrating from Hadoop to AWS, data needs to be available immediately to realize the business and technological benefits from cloud computing. Here’s how using a data-first approach will de-risk your AWS data migration.

All the cost benefits of moving applications as part of an AWS cloud migration won’t matter if the data isn’t available to deliver business value. A retail executive who can’t access the performance dashboard on Monday morning to see how stores in their territory performed over the weekend will not be pleased with a big bang migration where the data is unavailable during migration — and the fallout will definitely hit the IT department. That’s why companies would benefit from taking a data-first approach to their AWS cloud migration initiatives.

In a recent virtual event, WANdisco Chief Technology Officer Paul Scott-Murphy outlined how a data-first approach can deliver business benefits faster while reducing risk of the migration itself.

Key takeaways from the session are below:

A data-first approach to migration makes it possible for data scientists to immediately start using cloud-scale analytics platforms in AWS. Data becomes the central element of migrating to the cloud. A data-first approach takes into account both the volume of data that may be sitting in on-premises systems and the fact that datasets change over time and then provides a way to migrate data so that it is immediately available in the cloud.

Migrate data early as part of your AWS cloud migration

Storage is the first piece of cloud migration. In many AWS cloud migrations, the foundation for data migration is the AWS storage capability itself, S3. Amazon S3 was the first cloud-scale service for cloud storage and the foundation for the existing data lake in Amazon. It can be used in place of on-premises datasets held in platforms like Hadoop and provides different types of storage classes. Amazon S3 is also undergoing continuous improvement and cost reductions, making it ideal for large-scale storage — even more so than on-premises platforms.

Migrate metadata as part of AWS data migration

Metadata is the next piece of a data-first migration strategy, and it’s essential to use the right tools so that the metadata is accessible when needed. The Glue Data Catalog for AWS works as a central metadata repository accessible from services provided by AWS and its partners. Using the Glue Data Catalog is essential for a cloud migration strategy from platforms like Hadoop.

Previously, companies would need to use technologies like Apache Hive to hold metadata. However, in AWS, the Glue Data Catalog stores metadata regarding data services, transformations, and targets for transformations. Unlike other services, Glue Data Catalog is fully managed and fully Hive compatible, enabling companies to open up access to metadata previously stored in Hive across a broader range of cloud services.

Analyze data using EMR

The third step is compute. Amazon EMR is one of the central services available for compute needs for analytic workloads in the cloud. This service is a cloud big data platform that provides functionality to enable technologies like Spark, Hive, and HBase. The advantages of using EMR include its elasticity, security, and flexibility, as well as its industry-leading low total cost of ownership, according to IDC. Using EMR opens up use cases for data sets in the cloud, including machine learning, ETL, clickstream analysis, and other services.

Often, WANDisco customers will leverage storage, metadata, and compute, as well as third-party services like Databricks and Snowflake. This lets them run analytics against large datasets that stretch beyond basic storage use cases. Taking a data-first approach to migration enables many of the analytic platforms available in AWS to function against previously locked up data on-premises, quickly and without business disruption.

The difference a data-first approach makes

A data-first migration means that companies use data as the central element for their migrations to AWS or the cloud. But to do this, they need to consider what happens with data in their on-premises environment. For example, data in Hadoop doesn’t remain stationary; it is constantly changing and constantly ingesting new data, which could be hundreds of terabytes or petabytes.

A data-first approach considers the large volume of data, how the data changes, and that the business may directly depend on the data being available at all times. Data migration cannot disrupt business operations, so there must be a way to migrate data immediately. This means that data is available in the cloud and that changes to data occurring on-premises are also available immediately in the cloud. This is live data, and to do this, companies need to implement a solution that can do this without interrupting the business. It needs to be introduced simply, not require application changes, and scale to the volume of data involved. Supporting AWS data migration and availability at any scale without data loss, without data inconsistencies, and without disrupting data operations is the definition of a data-first migration approach.

Watch the webcast

A data-first strategy means moving as much of your live data into the cloud as fast as possible to take advantage of cloud scale storage, analytics, and new capabilities.


Tony Velcich

Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.

FOLLOW

SUBSCRIBE

Get notified of the latest WANdisco Blog posts and Newsletter.

Our LiveData Story

Related Blog Posts

https://wandisco.com/news-events/blog/tech-trends/leverage-data-first-strategy-your-aws-cloud-migration

Tech & Trends

Leverage a Data-First Strategy for Your AWS Cloud Migration

Leverage a Data-First Strategy for Your AWS Cloud Migration

Oct 12, 2021

Read More
https://wandisco.com/news-events/blog/tech-trends/how-wandisco-enables-high-availability-distributed-ledgers

Tech & Trends

How WANdisco Enables High Availability for Distributed Ledgers

Overview of recent work integrating WANdisco’s Distributed Coordination Engine (DConE) with two of t...

Aug 13, 2021

Read More
https://wandisco.com/news-events/blog/tech-trends/three-considerations-hadoop-cloud-migration

Tech & Trends

Three Considerations for Hadoop-to-Cloud Migration

As enterprises shift from Hadoop to cloud-based platforms, they are focusing not just on the end res...

Aug 03, 2021

Read More

Seeing is Believing. Try WANdisco Now.

Fully-featured, self-service and automated.

Start migrating Hadoop data in minutes, at any scale, to any cloud

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.