I’m running some tests of transform rules for the AWS Database Migration Service (DMS). The source database is MongoDB (I’m using AWS DocumentDB) and the target database is PostgreSQL (I’m using AWS RDS). I’m using "document mode" (as opposed to "table mode", the other option for extracting data from Mongo). In "document mode", you end up with one required column per table in the target: _doc
, which consists of a json blob of the whole Mongo document. Optionally, you can add a _id
column, which is a string version of the Mongo ObjectId.
I need to extract information from the _doc
column to place into new columns in the target table (using the add-column
action), but I can’t seem to get access to the _doc
column using the transform rules (which use SQLite functions to do the expression parsing). I can get access to the _id
column, though. This discrepancy is consistent whether I try transforming each of these fields to string
s, or to clob
s. In the example mapping below, there are four transformations: a copy of the value of _id
to a DMS string
, a copy of the value of _id
to a DMS clob
, a copy of the value of _doc
to a DMS string
, and a copy of the value of _doc
to a DMS clob
.
(Please see AWS documentation on transform rules here.)
Here are the contents of the BASIC
collection in the source db:
[ { _id: ObjectId("6479f558e1000ae07668a13b"), FOO: 'bar' } ]
Here is the DMS mapping json:
{
"rules": [
{
"rule-type": "transformation",
"rule-id": "719174426",
"rule-name": "719174426",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "COPY_OF_ID_AS_CLOB",
"old-value": null,
"data-type": {
"type": "clob"
},
"expression": "$_id"
},
{
"rule-type": "transformation",
"rule-id": "719122613",
"rule-name": "719122613",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "COPY_OF_ID_AS_STRING",
"old-value": null,
"data-type": {
"type": "string",
"length": "200",
"scale": ""
},
"expression": "$_id"
},
{
"rule-type": "transformation",
"rule-id": "719057983",
"rule-name": "719057983",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "COPY_OF_DOC_AS_STRING",
"old-value": null,
"data-type": {
"type": "string",
"length": "200",
"scale": ""
},
"expression": "$_doc"
},
{
"rule-type": "transformation",
"rule-id": "649580464",
"rule-name": "649580464",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "COPY_OF_DOC_AS_CLOB",
"old-value": null,
"data-type": {
"type": "clob"
},
"expression": "$_doc"
},
{
"rule-type": "selection",
"rule-id": "641760382",
"rule-name": "641760382",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "include",
"filters": []
}
]
}
Here is the schema that DMS generated for the BASIC
table in the target db:
column_name | data_type
-----------------------+-------------------
_id | character varying
_doc | text
COPY_OF_DOC_AS_CLOB | text
COPY_OF_DOC_AS_STRING | character varying
COPY_OF_ID_AS_STRING | character varying
COPY_OF_ID_AS_CLOB | text
(6 rows)
And here is the resulting record in the BASIC
table in the target db:
_id | 6479f558e1000ae07668a13b
_doc | { "_id" : { "$oid" : "6479f558e1000ae07668a13b" }, "FOO" : "bar" }
COPY_OF_DOC_AS_CLOB |
COPY_OF_DOC_AS_STRING |
COPY_OF_ID_AS_STRING | 6479f558e1000ae07668a13b
COPY_OF_ID_AS_CLOB | 6479f558e1000ae07668a13b
I would have expected COPY_OF_DOC_AS_CLOB
and COPY_OF_DOC_AS_STRING
to show the same value as _doc
. Can anyone tell me what I might be doing wrong?
2
Answers
The
expression
field in thesetransformation rules
is not correctly referencing the_doc column
. The expression field should use a validSQLite expression
to reference the source column. In this case, you could try usingjson_extract(_doc, '$')
as the expression to reference the entire _doc column.Modify thetransform rules
After the migration, execute the following SQL statements in the target PostgreSQL database to update the values of the COPY_OF_DOC_AS_STRING and COPY_OF_DOC_AS_CLOB columns:
custom transform rule
Consider this soln given the requirement that you want to
extract
information from the_doc
column indocument mode
without using thejson_extract()
function or apost-migration
step.Template to extract information from the _doc column in document mode:
This rule would extract the
FOO field from the _doc
column and create a new column calledCOPY_OF_DOC_AS_STRING
.In your current setup, it appears that the transformation rules are treating the "_doc" column as a plain string rather than a JSON object. As a result, the transformation rules are unable to access the nested values within the JSON object.
To overcome this issue, you need to use appropriate SQLite functions within the transformation rules to extract the desired values from the "_doc" column. Specifically, you would need to use JSON functions to parse and retrieve the necessary information from the JSON object.
Here’s an example of how you can modify the transformation rule for "COPY_OF_DOC_AS_STRING" to extract the value of "FOO" from the "_doc" column:
In this modified rule, the json_extract function is used to retrieve the value of "FOO" from the "_doc" column, assuming that "FOO" is a field within the JSON object. You can adjust the expression according to the structure of your JSON object to extract the desired values.
By incorporating the appropriate SQLite JSON functions, you should be able to extract the necessary information from the "_doc" column and populate the respective columns in the target PostgreSQL table correctly.