skip to Main Content

I am new to artificial intelligence and I am using TensorFlow object detection API to detect a product on images, so it already detecting the object, but I want to get coordinates Xmax, Xmin, Ymax, and Ymin for each object in the images.

That is the image with an object detected, in this case, 2 objects were detected in the image.

Image:

We can see that I got the coordinates of the objects but its not clear, there are more than 3 coordinates in the output and I just want to get the amount of coordinates as the number of objects that are in the image.

This the code which provide the output

with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
        detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')

        print(detection_graph.get_tensor_by_name('detection_boxes:0'))

        for image_path in TEST_IMAGE_PATHS:
            boxes = detect_objects(image_path)
            print(boxes)

Output

Tensor("detection_boxes:0", dtype=float32)
[[[0.16593058 0.06630109 0.8009524  0.5019088 ]
  [0.15757088 0.5376015  0.8869156  0.9394863 ]
  [0.5966009  0.88420665 0.6564093  0.9339011 ]
  ...
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]]]

I want to get something like that, but only the coordinates of the Bounding Box. We are assuming that they are the coordinates of the objects.

[0.16593058 0.06630109 0.8009524  0.5019088 ]
[0.15757088 0.5376015  0.8869156  0.9394863 ]

2

Answers


  1. You should be aware of two things:

    1. These are all the coordinates of all (usually 100) top detections.
    2. These are given in normalized coordinates.

    Therefore, in order to filter the detections by score, use detection_scores in order to decide which indices to filter out (they’re sorted), and you can multiply the normalized coordinates with the original image size in order to get the absolute coordinates. The normalized coordinates are given in the format of [ymin, xmin, ymax, xmax], therefore you should multiply the first and the third coordinates with y_size and the second and the fourth with x_size. You can compute x_size and y_size by evaluating the shape of image_tensor.

    Login or Signup to reply.
  2. Code:

    for box in boxes[0]:
        xmin, ymin, xmax, ymax =box
        bboxes.append([int(ymin *640),int(xmin*480) , int((ymax-ymin)*640), int((xmax-xmin)*480)])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search