Make SQL Data Easier to Merge (Static Data)
Currently, SQL Data files contain a bunch of insert statements, but the order of the insert statements can be inconsistent. For example, new records might get inserted on the top or the bottom. Also, multiple SET IDENTITYINSERT statements sometimes get created. This makes it difficuly to merge data changes between multiple branches in TFS Source Control. If the Insert statements had a consistent order and only included one SET IDENTITYINSERT diffs and merges would be easier.
Thanks for this suggestion and for the many comments and upvotes. I realize that this is a pain point.
I have a few shorter-term workarounds to summarize as well as some information on the longer roadmap in this update. I know these shorter-term workarounds aren’t perfect (I summarize the pros and cons), but I’m posting them as they may help a few folks.
Workaround 1) When data changes to static data need to be made, use a “relink the table” pattern
One can “cleanly rescript” a static data table in SQL Source Control by:
- Unlinking the static data table
- Relinking the static data table
Pro: This works with the GUI and requires no special knowledge or comfort with TSQL. This may help folks with just a few static data tables.
Con: This requires extra steps and results in extra commits in the history, which I realize can be annoying, particularly if the team doesn’t squash commit history when merging or do similar things.
Workaround 2) Use a post-deployment script to manage static data tables instead of linking them
Another option is to not use the “link static data tables” feature at all, and instead manage the static data table yourself with a custom script.
I’ve added a code sample called “post-deployment script to manage static data for a table with an identity column” into our reference on post-deployment scripts: https://documentation.red-gate.com/soc7/common-tasks/working-with-pre-post-deployment-scripts/static-data
Pro: This gives you complete control over the TSQL.
Con: Your post-deployment script might become quite complex and create its own issue in merge conflicts, depending on how many static data tables you have and whether you are using the post-deployment script for other things.
An alternate approach using SQL Change Automation
If customers have licenses for SQL Change Automation, the SCA Migrations approach does not have this issue and has additional features for flexibility with static data:
- Since migrations only script out the data which has changed, this scripting issue of the entire table doesn’t occur in SCA – it only ever needs to generate commands for what has been modified since the last migration
- Supports column filtered static data tables in the SCA plugin in SSMS
- Supports multiple post-deployment scripts, in case there is a preference to manage static data tables in dedicated post-deployment scripts
- Allows approaches like bulk loading larger static data tables by supporting SQLCMD variables in migration and post-deployment scripts
If there are other use cases where a migrations approach would add value to the team – such as easily being able to run data modification statements as part of deployments, or being able to customize the TSQL for schema changes – then changing from SQL Source Control to SQL Change Automation might make sense.
I’ve chatted with our engineering teams and Redgate does have the desire to better support this scenario longer term. This is not a “quick fix” scenario and will likely require a rethinking of the way static data is managed within the tool.
While the “rethinking” bit of this has begun, we don’t specifically have this on the SQL Source Control roadmap (https://www.red-gate.com/products/sql-development/sql-source-control/roadmap) at this point. We do have improved support for use with Git feature branches as a future candidate on the roadmap, and we do know from research with many customers that static data is quite important to folks. While I can’t make any promises, I think we are likely to be able to offer an improvement in this area.
Peter Colbourne commented
Any movement on this? Been using Data Compare regularly and I'd classify this as a serious bug that can lead to data being overwritten and lost in a multi-dev environment. Even for personal use the re-ordering makes it very hard to maintain your sanity. Have started looking into alternative solutions.
Paolo Abeabe commented
I agree with what that Neil guy was saying.
Neil Pugh commented
Add an option that allows the user to select the column to order the generated INSERT statement by on the link/unlink table view. Maybe I'm being naïve but is there anymore to it than that?
I would say it's less than 11 years of development!
Brian Meyers commented
It is CRAZY that this hasn't been fixed. I have been a long time customer. But I am going to have to start looking at other options.
This makes trying to figure out who made data changes, and when, nearly impossible.
Dan O'Connor commented
I've just had to create tables in a temp database then join them and compare every column in a big WHERE just to do a code-review. The diffs resulting from SQL Source Control data-changes are realistically unusable for validating changes.
Sometimes when we are committing data to our source control, the contents of the file changes not just in places where data have changed, but also data rows that have not been modified are put by SQL Compare in different places. Sometimes there is a lot of reordered rows, making it difficult to review the changes. How can we make SQL Compare not reorder rows to avoid this problem? Thank you! […]
Thank you for your response! I suspected that the effect I observe is
probably considered a solution to a performance issue. I will note
though that from my point of view, you are optimizing the wrong thing:
computer time vs. human time. We would gladly wait few seconds, maybe
even minutes more, in order to have clean review in which human does not
have to spend minutes to look for actual changes.
Wolfgang Reh commented
I agree. This needs to be part of the software. How can you every use a *_Data.sql file with any kind of source control system when the order or the rows is not consistent. In the meantime I have to sort the data files manually which only works if the primary key is in the first column(s)
The lack of improvement in this area makes merging of static data a complete nightmare! A recent merge took 7 days primarily due to data merge issues, when the code part was a few hours!!
I think the compare key should also be highlighted somehow, perhaps in a block comment BEFORE the INSERT statement, so that it is used for matching rows (can create a regex for that in beyond compare)
THis has got to be fixed before I find some alternative tooling to make branch merges take a sensible time.
Redgate, you must be kidding! Please get your priorities straight on this one!
This is not a feature request. This is one solid bug that makes this product pretty much useless!
How can you possibly switch the order of records between left and right side and expect to get any kind of useful comparison result??
Just to make sure you really understand the implications of this: My coworker and I are working in a Shared DB setup and while my SQL Source Control reports the static data table as unchanged, her SQL Source Control does flag it as changed because hers is exporting the records for the exact same table in a different order than mine!
Please treat this as the critical bug that it really is and get it fixed!
I can not believe that you have been kicking this can down the road for 2 years now...
Ed Morris commented
PLEASE DO THIS NOW!!! Merges are a nightmare in the db mostly due to this non-sense of not ordering by key.
Tony Mudd commented
This feature would reduce the chances of conflict and make manually checking the changes much easier.
Today is the 7th anniversary of the creation of this feature request.
Red Gate Product Management, given the number votes for this enhancement, it deserves a response of some kind. Preferably a commitment to get this on the product roadmap. If not, then an explanation as to why it's not simple to implement.
With over 400 votes I can't understand why this hasn't been looked at yet?
Ordering by the primary key would be fairly simple and it would help a lot. We have some tables with thousands of records in that we check in using SQL Source Control and we have a lot of problems without this feature.
Wondering if the reason this issue is being ignored is because of the subjective title "Make SQL Data Easier to Merge". Perhaps it would get more attention if we change it to "Maintain Data Files Sorted by Primary Keys"
Kevin Seitz commented
Six years later, still nothing? Meanwhile, my source control SQL files look like they're being maintained by a 5 year old.
This would be tremendously useful feature to improve source code merge process.
Dean Martens commented
This must be such an easy option to add. It's causing me and the rest of your users a huge headache. Please fix it. It think it has sufficient votes. Add it as an option so it's backwards compatible if you like but our code reviews are a mess because of this default behavior.
Steve Jones commented
When storing the data items in files, it would be good to keep everything in PK order so that changes are easier to spot. Historical order is less important when working with static data.
Even if RedGate don't implement a merge function (which is desperately needed), just sorting the values in primary key order when sending scripts to TFS would be hugely helpful!
But seriously, fix this.
Even the SQL Data Compare GUI as part of SQL Source Control would be a huge step forward. Not sure what the hold up is on RedGate's end.
We can haz open source and implement ourselves?