skip to Main Content

How to delete the strings before the first occurrence, and after the last occurrence of the delimiter? The delimiter is an underscore "_".

Sample data is:

[
  {
    "originalValue": "DELETERANDOMVALUE_Keep_This_Words_DELETERANDOMVALUE"
  }
]

The expected result should be:

[
  {
    "cleanedValue": "Keep_This_Words"
  }
]

With the below code, the string before the first occurrence of the delimiter is removed. But I am unable to delete the strings after the last occurrence.

db.collection.aggregate([
  {
    $project: {
      _id: 0,
      cleanedValue: {
        $substr: [
          "$originalValue",
          {
            $add: [
              {
                $indexOfBytes: [
                  "$originalValue",
                  "_"
                ]
              },
              1
            ]
          },
          {
            "$strLenBytes": "$originalValue"
          }
        ]
      }
    }
  }
])

Here is a playground I used.

Thank you for the help.

2

Answers


  1. Try to split the string into an array, remove the first and last element and return the joined string:

    db.collection.aggregate([
      // Phase 1: split the string by '_'
      {
        $project: {
          _id: 0,
          splitValue: {
            $split: [
              "$originalValue",
              "_"
            ]
          }
        }
      },
      // Phase 2: remove the first and last element from the array
      {
        $project: {
          cleanedValue: {
            $slice: [
              "$splitValue",
              1,
              {
                $subtract: [
                  {
                    $size: "$splitValue"
                  },
                  2
                ]
              }
            ]
          }
        }
      },
      // Phase 3: join the remaining elements into a string separated by '_'
      {
        $project: {
          finalValue: {
            $reduce: {
              input: "$cleanedValue",
              initialValue: "",
              in: {
                $concat: [
                  "$$value",
                  // This condition avoids adding a '_' as leading character
                  {
                    "$cond": [
                      {
                        $eq: [
                          "$$value",
                          ""
                        ]
                      },
                      "",
                      "_"
                    ]
                  },
                  "$$this"
                ]
              }
            }
          }
        }
      }
    ])
    

    Link to playground.

    Login or Signup to reply.
  2. You can find first and last unwanted parts using $regexFind and then use $replaceAll to replace them with empty strings. Like this:

    db.collection.aggregate([
      {
        "$addFields": {
          "first": {
            "$regexFind": {
              "input": "$originalValue",
              "regex": "^([^_]*)_"
            }
          },
          "last": {
            "$regexFind": {
              "input": "$originalValue",
              "regex": "_([^_]*)$"
            }
          }
        }
      },
      {
        "$project": {
          originalValue: {
            "$replaceAll": {
              "input": {
                "$replaceAll": {
                  "input": "$originalValue",
                  "find": "$first.match",
                  "replacement": ""
                }
              },
              "find": "$last.match",
              "replacement": ""
            }
          }
        }
      }
    ])
    

    Playground link.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search