skip to Main Content

I’m trying to replace newline characters in a JSON string, because Postgres’ COPY function won’t work otherwise.

I’ve been trying different combinations of sed, such as:

gsed -z 's/\n/\\n/g'
gsed -E ':a;N;$!ba;s/r{0,1}n/\n/g'
gsed '/}$/{N; s/}n{/},n{/}

But it either replaces all of them or none.

Here’s my sample data:

{"corpusid":75399585,"externalids":{"ACL":null,"DBLP":null,"ArXiv":null,"MAG":"2350229292","CorpusId":"75399585","PubMed":null,"DOI":null,"PubMedCentral":null},"url":"https://www.semanticscholar.org/paper/aec9799d6e12f2000deb8f7dadcc7d7e653f7393","title":"Prevention and Therapy of Osteoporosis","authors":[{"authorId":"40937034","name":"Hu Jun-bo"}],"venue":"","publicationvenueid":null,"year":2003,"referencecount":0,"citationcount":0,"influentialcitationcount":0,"isopenaccess":false,"s2fieldsofstudy":[{"category":"Medicine","source":"s2-fos-model"},{"category":"Medicine","source":"external"}],"publicationtypes":null,"publicationdate":null,"journal":null,"updated":"2022-02-11T07:19:53.877Z"}
{"corpusid":83138554,"openaccessinfo":{"externalids":{"MAG":"883546316","ACL":null,"DOI":null,"PubMedCentral":null,"ArXiv":null},"license":null,"url":null,"status":null},"abstract":"Phytosulfokin-alpha (PSK-alpha) ist ein bisulfatiertes Pentapetid, welches als Wachstumsfaktor die Proliferation von Zellen in Zellkultur niedriger Dichte stimuliert. In der vorliegenden Arbeit wurden physiologische und genetische Ansatze verfolgt, um Funktionen von PSK-alpha in Arabidopsis thaliana aufzuklaren. Promotor-GUS Expressionsanalysen der PSK-Praproproteingene und des PSK-Rezeptorgens AtPSKR1 in Wurzeln und Bluten deuteten auf mogliche Funktionen von PSK-alpha in diesen Geweben hin. nExogen appliziertes PSK-alpha forderte das Wurzelwachstum von Arabidopsiskeimlingen. Das Wurzelwachstum der PSK-Rezeptor T-DNA Insertionsmutante Atpskr1-T war inhibiert. Diese Ergebnisse zeigten, dass PSK-alpha in die Regulation von Wurzelwachstum involviert ist. nCharakteristisch fur die Atpskr1-T Mutante war eine verringerte Samenbildung im Vergleich zu Wildtyppflanzen. Genetische Analysen zeigten, dass sowohl ein paternaler als auch maternale Effekte und asynchrones Wachstum der inneren Blutenorgane zur Ausbildung des Mutantenphanotyps fuhren.","updated":"2022-02-11T01:27:48.987Z"}

As you can see, second JSON string has n within the "abstract" field, it needs to be replaced to \n I assume for the COPY to work.

But I have to retain newline in between JSON strings, as there’s thousands of them, so that COPY reads that file properly.

2

Answers


  1. Chosen as BEST ANSWER

    I have reused the portion with jq by Cyrus and wrapped it into a sh script, as I had many files I needed to convert. Please note it also replaces " with " as it was preventing COPY function from properly importing JSON strings.

    Here's the result:

    #!/bin/bash
    
    input_file=$1
    output_file=$2
    
    while read -r line; do
      if echo "$line" | jq -e 'has("abstract")' > /dev/null; then
        echo "$line" | jq -c 'try .abstract |= (gsub("\n"; "\n") | (gsub("""; """)))' | tee -a $output_file > /dev/null
      else
        echo "$line" | tee -a $output_file > /dev/null
      fi
    done < $input_file
    
    tail -n 1 $input_file | jq -c 'try .abstract |= (gsub("\n"; "\n") | (gsub("""; "&quot;")))' >> $output_file
    

    Usage:

    ./script_name.sh original_file.json converted_file.json


  2. A workaround without sed:

    head -n 1 file > new_file
    tail -n 1 file | jq -c '.abstract |= (gsub("n"; "\n"))' >> new_file
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search