Improve Flyway Static Data Performance for Large Data Sets
Flyway’s Static Data feature currently generates one INSERT statement per row of tracked data. While this approach works well for small datasets and produces clean, readable diffs in version control, it does not scale effectively for large static data sets.
In our environment, restoring a clean database with large static tables (180,000+ rows) fails, we believe, due to the performance overhead of executing hundreds of thousands of individual INSERT statements.
We believe the core issue is the inefficiency of row-by-row inserts. On SQL Server, a proven optimization is to use Table Value Constructors (TVCs) in INSERT statements, which allow multiple rows to be inserted in a single operation and can dramatically improve performance:
https://learn.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql
At the same time, we understand and value Flyway’s current design choice of generating individual INSERT statements, as it enables clean, minimal diffs in Git when rows are added, modified, or removed.
Proposed improvement:
Allow Flyway to continue storing static data as individual row-level statements in version control, but apply TVC-based inserts transparently at execution time.
This would preserve the current Git diff behavior while significantly improving performance during database rebuilds and deployments.
Thanks for your feedback. We're looking forward to seeing how other people vote/comment on this.
In the meantime, with Flyway's migrations approach, you can write/edit your own versioned migration scripts for how the data is deployed. We've even had customers use "Script Migrations" and bash/powershell scripts to bulk load data. This might be an option and there's more information about this at https://documentation.red-gate.com/fd/working-with-data-138347109.html#Workingwithdata-Usingscriptmigrationstomanagedata.
Also, if you are restoring a clean database, you might want to look into our backup as a baseline approach - https://documentation.red-gate.com/fd/baselines-273973441.html#:~:text=the%20only%20option.-,Provisioning%20the%20shadow%20database,-Provisioning%20generally%20requires.