There are two tables in my project:
Legacy table
col id | col B | col C |
---|---|---|
x | y | z |
a | b | c |
New table (slightly different schema)
col id | col B | another col |
---|---|---|
x | y | bla |
As you can see, the legacy table has some data that the new table does not. This is because the code only recently started populating the new table. As a result, I have a task to move data from the legacy table to the new table where the old record’s id is not present.
My app is a simple Spring Boot Kotlin API, with Postgres DB technology using Flyway.
My question is: Could an extra Flyway script to handle this data migration cause any technical issues? I have seen Flyway SQL scripts used to construct/alter the schema, but haven’t seen them applied to actually move data from one table to another. Is there any reason not to do this?
2
Answers
Using Flyway to move data is considered a suitable practice. In fact, Flyway is well-suited for this purpose, as long as the scripts are carefully crafted and there are no other factors outside the scope of the original question that would prevent the data from being moved.
When compared to complex data operations I have encountered in the past, moving data without any further transformations appears relatively straightforward.
Flyway scripts have broad applicability, including tasks such as data population, data removal, data updates, and schema operations. These operations may involve adding or removing columns, changing data types, or creating and deleting tables. Often, these tasks are interconnected, with actions like adding a new column requiring the population of data in that column, which may involve querying existing data.
While the existing answer talks about the technical feasibility, perhaps an operational perspective will be useful as well.
In my experience, what works best is to ensure that each table data has a single source of change, i.e. one process/entity responsible for any data alterations. Mixing batch/deployment-time processes with manual ‘interventions’ in particular is a bone of contention I try to avoid at all costs.
The practical implication is that DB migrations should only deal with:
And if I were to make the decision whether a Flyway script is suitable in a given scenario, I would ask myself whether the scenario is one of the above.
Now, back to the scenario at hand, you mention that
implying that this is really an operational table, which is being actively modified by the application. This would tell me that (1) the scenario absolutely does not fall under one of the categories I mention, and (2) the possibility of conflicts between the migrated and the current data is so great that I would never, ever in my life trust an automated process alone to handle the migration correctly without any supervision. Especially if that script runs as part of the deployment process, which could be run at someone else’s whim.
In this answer I’m of course making some assumptions about how your organization runs so please take it with a grain of salt. Your mileage may vary.