How Spotify Streamlined Large-Scale Dataset Migrations with Background Coding Agents
Introduction
Migrating thousands of datasets across a complex data infrastructure is no small feat. At Spotify, ensuring that downstream consumers—teams and services that rely on these datasets—experience minimal disruption is a top priority. In this article, we explore how Spotify Engineering leveraged three key tools—Honk, Backstage, and Fleet Management—to automate and ease the pain of large-scale dataset migrations. By using background coding agents, the team supercharged the migration process, reducing manual overhead and downtime.

The Challenge: Migrating Thousands of Datasets
Spotify’s data platform serves hundreds of internal teams, each depending on datasets that feed dashboards, machine learning models, and real-time features. Over time, schema changes, storage optimizations, or platform upgrades require datasets to be migrated. Doing this manually for thousands of datasets is error-prone and time-consuming. Each migration must ensure that downstream consumers—the teams that query or stream the data—experience no breakage or data loss.
Pain Points with Downstream Consumers
Traditional migration approaches often involve:
- Coordinating with each consumer team to update their code or configuration.
- Risk of breaking production pipelines if migrations are not properly tested.
- Significant manual effort to track dependencies and validate correctness.
These pain points led Spotify to develop an automated, background-agent-based approach.
Introducing Honk: Background Coding Agents
Honk is Spotify’s system for running background coding agents—automated processes that perform code changes, data transformations, and configuration updates across repositories. For dataset migrations, Honk acts as the engine that drives the changes needed to adapt downstream consumers to the new dataset schema or location.
How Honk Works
Honk agents are triggered by migration events. They:
- Scan repositories that consume the dataset being migrated.
- Generate the necessary code or configuration changes (e.g., updating column names, changing table references).
- Create pull requests with the changes, ready for review by the owning team.
- Track progress to ensure all affected repositories are updated.
This approach removes the manual burden from data engineers and gives confidence that transformations are consistent.
Leveraging Backstage for Self-Service Visibility
While Honk performs the heavy lifting, Backstage—Spotify’s developer portal—provides the interface for engineers to manage and monitor migrations. Through Backstage, each team can:
- View a dashboard of all datasets they own or consume.
- See the status of ongoing migrations (e.g., how many repositories still need updates).
- Approve or reject changes proposed by Honk agents.
This self-service model empowers teams to stay informed without constant manual coordination. Backstage also serves as the single source of truth for dataset ownership and dependencies, which Honk uses to determine which agents to run.

Fleet Management for Agent Orchestration
Running hundreds of concurrent agents across thousands of repositories requires robust orchestration. Spotify uses Fleet Management to schedule, scale, and monitor Honk agents. Key capabilities include:
- Dynamic scaling: Agents are spun up based on migration workload.
- Error handling and retries: Agents automatically retry failed operations.
- Resource isolation: Each agent runs in a sandbox to prevent interference.
Fleet Management ensures that migrations proceed efficiently without overwhelming the underlying infrastructure.
Results and Benefits
The combination of Honk, Backstage, and Fleet Management has transformed dataset migrations at Spotify:
- Reduced manual effort: Engineers now spend minutes instead of days per migration.
- Faster rollouts: Migrations that once took weeks are completed in hours.
- High reliability: Automated agents minimize human error, and Backstage provides clear auditing.
- Improved developer experience: Teams have full visibility and control without being overwhelmed.
Over thousands of datasets, this system has saved countless hours and prevented production incidents.
Conclusion
Migrating datasets at scale is a classic infrastructure challenge. By building background coding agents with Honk, integrating them into Backstage for visibility, and orchestrating with Fleet Management, Spotify Engineering turned a painful manual process into an automated, reliable pipeline. This approach not only speeds up migrations but also builds trust with downstream consumers—a win for platform reliability and developer productivity.
Want to learn more? Explore other posts in the Honk series on the Spotify Engineering blog.