Data Migrator

Move your data and metadata to the cloud or between clusters, with no downtime and no service disruption

“WANdisco's uniqueness lies in how it packages Hadoop data migration as a fully hands-off service. Moving data under active change is delicate, and organizations don't want to use their best IT people on it. WANdisco's Data Migrator handles everything in the background and doesn't require expertise from the customer. It's as close to a silver bullet as you can find for this type of project”

Merv Adrian, Gartner Research Vice President of Data and Analytics


What is Data Migrator?

Migrate from Hadoop to cloud without disruption or downtime.

Data Migrator is a fully automated cloud migration solution that migrates HDFS data and Hive metadata to the cloud, even while those data sets are under active change. Data Migrator is fully self-service requiring no WANdisco expertise. It requires zero changes to applications or business operation. Migrations of any scale can begin immediately, and be performed while the source data is under active change without requiring any production system downtime or business disruption, and with zero risk of data loss.


Enabling administrators to easily deploy the solution and begin migration of data lake content to the cloud immediately. It is entirely non-intrusive and requires zero changes to applications, cluster or node configuration or operation.


Leveraging WANdisco's live data capabilities, data migration can occur while the source data is under active change, without requiring any production system downtime or business disruption, supporting complete and continuous data migration.


Data Migrator is able to accommodate data migration at any scale, from terabytes to exabytes, and without any risk of data loss.

Benefits of Data Migrator

Data Migrator enables you to transition to a live data environment which makes your data globally available, accurate and protected, avoiding the costs of a manual migration and the pattern of data silos that emerge when data cannot be kept consistent.

Business Continuity

  • No need for downtime of on-premises production clusters
  • Immediate availability of migrated data
  • High scalability and performance for migration at any scale

Complete and Continuous Migration

  • Data migration with single pass of source storage
  • Ongoing migration of any subsequent data changes
  • Ensures zero data loss of source data and changes

Cost Avoidance

  • Minimizes the need for IT resource involvement
  • Automated migration without custom code maintenance
  • Faster time-to-value and adoption of AI and ML

Data Migrator Automates Cloud Migration

Zero Business Disruption, Zero Risk, and Best Time-to-Value

Quick deployment and operation

Data Migrator is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impacting current operations, so users can begin migrations immediately.

Complete and continuous migration

Migrates existing datasets with a single pass through the source storage system, eliminating the overhead of repeated scans, while also supporting continuous migration of any ongoing changes from source to target with zero disruption to current production systems.

Multiple source and target systems support

Supports HDFS distributions v2.6 and higher as source systems, and all leading cloud service providers and other select ISVs, such as Databricks and Snowflake, as the target systems. See the Data Migrator documentation and release notes for details.

Browser-based user interface

Users can leverage the WANdisco user interface (UI), a browser-based UI that allows them to manage the full data migration (data and metadata) from the single management console.

Configurability and control

Ability to configure the migrations to meet the organizations specific needs, including standard configuration such as defining sources, targets, and data to be migrated, as well as advanced capabilities such as migration prioritization, path mapping, and network bandwidth management controls.

Migration at any scale

Migrates big datasets at any scale, from terabytes to multi-petabytes, without impacting current production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.

Hadoop data and Hive metadata migration

Supports migration of HDFS data and Hive metadata to any public cloud, as well as to other on-premises environments.

Migration verification

Migration verification scans both source and target environments to ensure data fidelity and validate the success of all data migrations. Notifications can be used to specify the status of migration verifications and receive the results by email.

Programmatic interface

Migrations can also be managed through a comprehensive and intuitive command-line interface or using the self-documenting REST API to integrate the solution with other programs as needed.

Metrics and monitoring

Information to keep you updated on the migration jobs, from health and status metrics providing estimates for migration completion to email notifications and real-time insights regarding usage and promoting hands-off operations.

The Data Migrator Approach

Only Data Migrator is able to move data lake content to the cloud immediately, at scale, with no application downtime and no risk of data loss, even when data sets are under active change.

Other approaches to large-scale Hadoop data migration rely on repeated iterations where source data is copied, but they do not take ongoing changes into account during that time. They require significant up-front planning, and impose operation downtime if there is a need for ensuring data are migrated completely.


Cookies and Privacy

At WANdisco, we respect your concerns about privacy and value the relationship that we have with you.

Like many companies, we use technology on our website to collect information that helps us enhance your experience and our products and services. The cookies that we use at WANdisco allow our website to work and help us to understand what information and advertising is most useful to visitors.

Please take a moment to familiarise yourself with our cookie practices and let us know if you have any questions by getting in touch through any of the methods listed on our "Contact Us" page.

We have tried to keep this Notice as simple as possible, but if you’re not familiar with terms, such as cookies, IP addresses, and browsers, then read about these key terms first.