skip to Main Content

I got some strange JSON-like format from Google Cloud OCR. It doesn’t have quoted keys, colons, or commas.

text_annotations {
  description: ","
  bounding_poly {
    vertices {
      x: 485
      y: 237
    }
    vertices {
      x: 492
      y: 237
    }
    vertices {
      x: 492
      y: 266
    }
    vertices {
      x: 485
      y: 266
    }
  }
}

Is there any simple way to parse it, or format as JSON?

I’ve tried adding quotes, colons, and commas by hand, but it’s not the best way.

Got these data using python code:

from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()

feature = vision.Feature(
    type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)


with io.open(path, 'rb') as image_file:
    content = image_file.read()

image = vision.Image(content=content)

response = client.text_detection(image=image)
print(response)

2

Answers


  1. Chosen as BEST ANSWER

    Finally i just did as people proposed in comments, and used API to print data in more useful format.


  2. OK, making a custom parser is probably overkill, but what about this potentially easy, quick&dirty hack: if the linebreaks and indentation etc. are stable, could you do a search + replace on these (via a tool, because then can be automated and also perform multiple replacements) in the form of replacing "n bounding_poly" with ",n "bounding_poly": [", which adds the comma to the previous line, adds the colon to the key, adds quotes to the key, turns its contents into an array (you then too get rid of the "vertices" so they become anonymous objects inside the array as the array’s elements). Please be aware of newline encoding differences (n, rn, r).

    Ideally also recognize if one of these search strings matches zero times, or attempt to read the final result as JSON to check for error messages.

    Not great, but maybe sufficient? Or has the source more variance or is it more dynamic/unstable?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search