Automate migration of Hadoop data and Hive metadata to AWS without disruption or downtime

WANdisco strengthens AWS engineering partnership with integration between Data Migrator and AWS Glue Data Catalog

play_circle_outline WATCH VIDEO Blog Post

Move to AWS with ease and enable a hybrid AWS environment

WANdisco's partnership with AWS helps you to migrate data and metadata to the cloud rapidly and easily and exploit the power and capabilities of AWS services, including Amazon S3, Amazon EMR and AWS Glue Data Catalog. WANdisco is an AWS Advanced Tier ISV partner and one of the first ISVs to achieve AWS Migration Competency in Workload Mobility: Data Migration.

AWS Marketplace Amazon EMR Migrations
AWS PARTNER NETWORK

Advanced Technology Partner

  • Advanced tier ISV Migration competency
  • Migration Acceleration Program (MAP) for Storage
  • Amazon EMR Migration Program (EMP)
“We found WANdisco’s Data Migrator to be the optimal approach to deliver the best time to value, rather than running a more time-consuming and costly manual migration project internally.”
Wayne Peacock, Chief Data and Analytics Officer at GoDaddy
godaddy logo
 

Data Migrator with AWS

Migrate on-premises HDFS to Amazon S3

WANdisco Data Migrator is a safe and reliable cloud migration solution that provides complete and continuous migration of HDFS data to AWS cloud.

Data Migrator is fully self-service requiring no WANdisco expertise or services. It is entirely non-intrusive and requires zero changes to applications, cluster or node configuration or operation. AWS customers looking to rapidly and successfully migrate their large-scale on-premises Hadoop data lake into the cloud may now turn to WANdisco for an automated data migration and replication solution with zero business downtime. WANdisco Data Migrator is the only platform that allows production applications on-premise to continue to operate while data is migrating and under active change.

Migrate Apache Hive to AWS Glue Data Catalog

An important requirement when modernizing legacy analytics workloads for the cloud is to keep business operating as normal by taking advantage of the metadata stored on-premises. Moving data to the cloud by replicating HDFS data to Amazon S3 using Data Migrator, is only the first step. You must also replicate the metadata to enable users to discover, understand and query the data.

To provide customers a complete migration solution, Data Migrator migrates metadata from Apache Hive directly to the AWS Glue Data Catalog. Data Migrator eliminates complex and error-prone workarounds that require one-off scripts and configuration in the Hive metastore, and integrates with a wide range of databases used by the Hive metastore making migration simple and painless.

What is AWS Glue Data Catalog

The AWS Glue Data Catalog is a persistent, Apache Hive compatible metadata store that can be used for storing information about different types of data assets, regardless of where they are physically stored. The AWS Glue Data Catalog holds table definitions, schemas, partitions, properties and more. It automatically registers and updates partitions to make queries run efficiently. It also maintains a comprehensive schema version history that provides a record for schema evolution.

The AWS Glue Data Catalog is a cloud-native, managed metadata catalog that is flexible, reliable, and usable from a broad range of AWS native analytics services, 3rd parties and open-source engines. AWS maintains and manages the service so that you do not need to spend time scaling as demands grow, responding to outages, ensuring data resilience or updating infrastructure.

 

Migration to Databricks

Data Migrator provides a comprehensive solution for migrating Hadoop data and Hive metadata, as well as the last mile migration to the format required by Delta Lake on Databricks. This enables users to manage the complete migration (HDFS to Databricks) using a single solution. Migrated data is immediately available for advanced Spark-based cloud analytics by Databricks on AWS.

Databricks enables companies to accelerate data-driven innovation with a unified approach to data analytics and AI. Leveraging Data Migrator to automate Hadoop data and Hive metadata migration directly to Databricks enables organizations to focus resources on development of new AI innovations rather than migration complexities enabling them to introduce new AI and ML capabilities much more quickly.

 

Free Cloud Data Migration Assessment

Get a complete analysis of your data migration plan, including best practices and guidance to accelerate the migration

Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.