skip to Main Content

I’m running some tests of transform rules for the AWS Database Migration Service (DMS). The source database is MongoDB (I’m using AWS DocumentDB) and the target database is PostgreSQL (I’m using AWS RDS). I’m using "document mode" (as opposed to "table mode", the other option for extracting data from Mongo). In "document mode", you end up with one required column per table in the target: _doc, which consists of a json blob of the whole Mongo document. Optionally, you can add a _id column, which is a string version of the Mongo ObjectId.

I need to extract information from the _doc column to place into new columns in the target table (using the add-column action), but I can’t seem to get access to the _doc column using the transform rules (which use SQLite functions to do the expression parsing). I can get access to the _id column, though. This discrepancy is consistent whether I try transforming each of these fields to strings, or to clobs. In the example mapping below, there are four transformations: a copy of the value of _id to a DMS string, a copy of the value of _id to a DMS clob, a copy of the value of _doc to a DMS string, and a copy of the value of _doc to a DMS clob.

(Please see AWS documentation on transform rules here.)

Here are the contents of the BASIC collection in the source db:

[ { _id: ObjectId("6479f558e1000ae07668a13b"), FOO: 'bar' } ]

Here is the DMS mapping json:

{
  "rules": [
    {
      "rule-type": "transformation",
      "rule-id": "719174426",
      "rule-name": "719174426",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_ID_AS_CLOB",
      "old-value": null,
      "data-type": {
        "type": "clob"
      },
      "expression": "$_id"
    },
    {
      "rule-type": "transformation",
      "rule-id": "719122613",
      "rule-name": "719122613",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_ID_AS_STRING",
      "old-value": null,
      "data-type": {
        "type": "string",
        "length": "200",
        "scale": ""
      },
      "expression": "$_id"
    },
    {
      "rule-type": "transformation",
      "rule-id": "719057983",
      "rule-name": "719057983",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_DOC_AS_STRING",
      "old-value": null,
      "data-type": {
        "type": "string",
        "length": "200",
        "scale": ""
      },
      "expression": "$_doc"
    },
    {
      "rule-type": "transformation",
      "rule-id": "649580464",
      "rule-name": "649580464",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_DOC_AS_CLOB",
      "old-value": null,
      "data-type": {
        "type": "clob"
      },
      "expression": "$_doc"
    },
    {
      "rule-type": "selection",
      "rule-id": "641760382",
      "rule-name": "641760382",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "include",
      "filters": []
    }
  ]
}

Here is the schema that DMS generated for the BASIC table in the target db:

      column_name      |     data_type
-----------------------+-------------------
 _id                   | character varying
 _doc                  | text
 COPY_OF_DOC_AS_CLOB   | text
 COPY_OF_DOC_AS_STRING | character varying
 COPY_OF_ID_AS_STRING  | character varying
 COPY_OF_ID_AS_CLOB    | text
(6 rows)

And here is the resulting record in the BASIC table in the target db:

_id                   | 6479f558e1000ae07668a13b
_doc                  | { "_id" : { "$oid" : "6479f558e1000ae07668a13b" }, "FOO" : "bar" }
COPY_OF_DOC_AS_CLOB   |
COPY_OF_DOC_AS_STRING |
COPY_OF_ID_AS_STRING  | 6479f558e1000ae07668a13b
COPY_OF_ID_AS_CLOB    | 6479f558e1000ae07668a13b

I would have expected COPY_OF_DOC_AS_CLOB and COPY_OF_DOC_AS_STRING to show the same value as _doc. Can anyone tell me what I might be doing wrong?

2

Answers


  1. The expression field in these transformation rules is not correctly referencing the _doc column. The expression field should use a valid SQLite expression to reference the source column. In this case, you could try using json_extract(_doc, '$') as the expression to reference the entire _doc column.Modify the transform rules

    {
      "rules": [
        {
          "rule-type": "transformation",
          "rule-id": "719174426",
          "rule-name": "719174426",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          },
          "rule-action": "add-column",
          "value": "COPY_OF_ID_AS_CLOB",
          "old-value": null,
          "data-type": {
            "type": "clob"
          },
          "expression": "$_id"
        },
        {
          "rule-type": "transformation",
          "rule-id": "719122613",
          "rule-name": "719122613",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          },
          "rule-action": "add-column",
          "value": "COPY_OF_ID_AS_STRING",
          "old-value": null,
          "data-type": {
            "type": "string",
            "length": "200",
            "scale": ""
          },
          "expression": "$_id"
        },
        {
          "rule-type": "transformation",
          "rule-id": "719057983",
          "rule-name": "719057983",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          },
          "rule-action": "add-column",
          "value": "COPY_OF_DOC_AS_STRING",
          "old-value": null,
          "data-type": {
            "type": "string",
            "length": "200",
            "scale": ""
          },
          "expression": "SELECT (_doc->>'_doc')::text FROM pg_temp.basic"
        },
        {
          "rule-type": "transformation",
          "rule-id": "649580464",
          "rule-name": "649580464",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          },
          "rule-action": "add-column",
          "value": "COPY_OF_DOC_AS_CLOB",
          "old-value": null,
          "data-type": {
            "type": "clob"
          },
          "expression": "SELECT (_doc->>'_doc')::text FROM pg_temp.basic"
        },
        {
          "rule-type": "selection",
          "rule-id": "641760382",
          "rule-name": "641760382",
          "object-locator": {
            "schema-name": "%",
            "table-name": "%"
          },
          "rule-action": "include",
          "filters": []
        }
      ]
    }
    

    After the migration, execute the following SQL statements in the target PostgreSQL database to update the values of the COPY_OF_DOC_AS_STRING and COPY_OF_DOC_AS_CLOB columns:

    UPDATE basic
    SET copy_of_doc_as_string = (_doc->>'_doc')::text,
        copy_of_doc
    

    custom transform rule

    Consider this soln given the requirement that you want to extract information from the _doc column in document mode without using the json_extract() function or a post-migration step.
    Template to extract information from the _doc column in document mode:

    {
      "rule-type": "transformation",
      "rule-id": "719174426",
      "rule-name": "719174426",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_DOC_AS_STRING",
      "old-value": null,
      "data-type": {
        "type": "string",
        "length": "200",
        "scale": ""
      },
      "expression": "json_extract(_doc, '$.FOO')"
    }
    

    This rule would extract the FOO field from the _doc column and create a new column called COPY_OF_DOC_AS_STRING.

    Login or Signup to reply.
  2. In your current setup, it appears that the transformation rules are treating the "_doc" column as a plain string rather than a JSON object. As a result, the transformation rules are unable to access the nested values within the JSON object.

    To overcome this issue, you need to use appropriate SQLite functions within the transformation rules to extract the desired values from the "_doc" column. Specifically, you would need to use JSON functions to parse and retrieve the necessary information from the JSON object.

    Here’s an example of how you can modify the transformation rule for "COPY_OF_DOC_AS_STRING" to extract the value of "FOO" from the "_doc" column:

    {
      "rule-type": "transformation",
      "rule-id": "719057983",
      "rule-name": "719057983",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "add-column",
      "value": "COPY_OF_DOC_AS_STRING",
      "old-value": null,
      "data-type": {
        "type": "string",
        "length": "200",
        "scale": ""
      },
      "expression": "json_extract(_doc, '$.FOO')"
    }
    

    In this modified rule, the json_extract function is used to retrieve the value of "FOO" from the "_doc" column, assuming that "FOO" is a field within the JSON object. You can adjust the expression according to the structure of your JSON object to extract the desired values.

    By incorporating the appropriate SQLite JSON functions, you should be able to extract the necessary information from the "_doc" column and populate the respective columns in the target PostgreSQL table correctly.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search