skip to Main Content

I am using cURL inside bash to fetch json data from remote API. Also using jq to parse and loop through the json array to perform additional operations. The response looks like this:

[
     {
        "services": [
            "HOM"
        ],
        "specialServiceId": "APPOINTMENT",
        "brandedName": "Appointment Home Delivery®",
        "incompatibleSpecialServices": [
            "DATE",
            "EVENING"
        ],
        "inputParameterRules": [
            {
                "name": "PHONE_NUMBER",
                "brandedName": "Appointment Phone Number",
                "required": true,
                "format":"^d{10,15}$",
                "description": "A valid 10 to 15 digit phone number including area code and optional extension"
            }
        ]
    },
    {
        "services": [
            "NDA_AM_EH",
            "2DA"
        ],
        "specialServiceId": "NO_SIG",
        "brandedName": "No Signature Required",
        "categoryId": "DEL_CON",
        "incompatibleSpecialServices": [
            "SIG",
            "DIRECT_SIG"
        ],
        "inputParameterRules": [
            {
                "name": "SIGNATURE_RELEASE_NUMBER",
                "description": "A valid signature release number"
            }
        ],
        "specialServiceLevel": "ALLPKG"
    },
  {...},
  {...}
] 

As you can see the format field inside inputParameterRules for PHONE_NUMBER contains a single backslash which makes this JSON un-parseable.

I would like to

  1. Either replace the single backslash with a double backslash before storing this json in a file OR
  2. Delete the format field.

Tried something like this:

 SPECIAL_SERVICES_RESPONSE=$(curl -s -X GET -H "Authorization: Bearer ${BEARER_TOKEN}" "${SPECIAL_SERVICES_URL}" | jq '[.[].inputParameterRules[] | del(.format)]')

But it produces no results.

Thanks

2

Answers


  1. As I mentioned in comments, the best approach would be to fix the API (or get its provider to fix it) so that it delivers well-formed JSON. There is every reason to demand this, for an API that promises JSON responses is buggy if it delivers invalid JSON under any circumstances.

    Alternatively, if you can control the data, then the best workaround would be to reformulate the regex in the API’s database to an equivalent one that does not use backslash () characters. This should get you API responses that do not require client-side correction.

    If neither of those is viable, then the best long-term approach would be to dump your unreasonable API provider in favor of a better one, but in the short term, there are client-side workarounds.

    I would like to

    1. Either replace the single backslash with a double backslash before storing this json in a file OR
    2. Delete the format field.

    Tried something like this:

    SPECIAL_SERVICES_RESPONSE=$(curl -s -X GET -H "Authorization: Bearer ${BEARER_TOKEN}" "${SPECIAL_SERVICES_URL}" | jq '[.[].inputParameterRules[] | del(.format)]')
    

    Since the problem to be solved is that the file is not valid JSON, it stands to reason that you cannot solve it via a mechanism that involves parsing the file as JSON.

    You can filter it with sed. The issue here is distinguishing between data that need fixed and data that don’t, especially if there is hope that the API provider will evantually fix the issue on their side.

    Here are some alternatives you could consider:

    • This sed command will dumbly double every backslash in the input:

      sed 's/\/\\/g'
      

      This will fix the specific malformation presented in the question, but it will break any JSON-ly correct backslash usage elsewhere in the file. And if the API provider ever gets their stuff together, it will start breaking the correct JSON they then emit.

    • This one is a bit more targeted. It will double every backslash in any line that matches /"format":/

      sed '/"format":/ s/\/\\/g'
      

      The targeting reduces the likelihood that erstwhile valid data elsewhere in the file are corrupted, but also avoids fixing such data if they are in fact invalid.

    • This one will add one backslash at the end of any run of an odd number of backslashes that is bounded by non-backslash characters:

      sed -E 's/([^\](\\)*)\([^\])/1\\3/g'
      

      That will fix your current malformation and not break when the API provider fixes your particular issue, but it might break JSON-ly correct backslash usage elsewhere.

    • Of course, you can combine the last two:

      sed -E '/"format":/ s/([^\](\\)*)\([^\])/1\\3/g'
      

      That gives you a fairly narrowly targeted approach that will not necessarily break when the API provider fixes the issue.

    The overarching theme here is that if you can’t rely on the API provider to emit correct data in the first place, then the best you can do is heuristically correct whatever they do emit, and that path is strewn with traps and pitfalls. You have to be concerned both about "fixing" data that were correct and valid in the first place and about failing to recognize or fix incorrect data.

    Login or Signup to reply.
  2. It turns out that halaxa’s "JSON Machine" (https://github.com/halaxa/json-machine)
    is very lenient in accepting "strings"
    when reading JSON, so one JSON-aware approach would be to use JSON Machine.
    For simplicity, the following uses the "jm" script (https://github.com/pkoppstein/jm)
    that uses JSON Machine under the hood.

    Since your post indicates you’re particularly interested in the "inputParameterRules",
    I’ll focus on that in the following.

    In brief, using jm twice, the following
    few lines produce the output shown below:

    jm  --pointer /-/inputParameterRules input.pseudojson |
        while read line ; do
            echo "$line" | jm -s
        done
    

    Output:

    {"name":"PHONE_NUMBER"}
    {"brandedName":"Appointment Phone Number"}
    {"required":true}
    {"format":"^d{10,15}$"}
    {"description":"A valid 10 to 15 digit phone number including area code and optional extension"}
    {"name":"SIGNATURE_RELEASE_NUMBER"}
    {"description":"A valid signature release number"}
    

    Since this is a stream, you can now use jq to reconstruct the objects as desired, e.g. omitting .format.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search