How can I get my VNCoreMLRequest
to detect objects appearing anywhere within the fullscreen view?
I am currently using the Apple sample project for object recognition in breakfast foods:BreakfastFinder. The model and recognition works well, and generally gives the correct bounding box (visual) of the objects it is detecting / finding.
The issue arises here with changing the orientation of this detection.
In portrait mode, the default orientation for this project, the model identifies objects well in the full bounds of the view. Naturally, given the properties of the SDK objects, rotating the camera causes poor performance and visual identification.
In landscape mode, the model behaves strangely. The window / area of which the model is detecting objects is not the full view. Instead, it is (what seems like) the same aspect ratio of the phone itself, but centered and in portrait mode. I have a screenshot below showing approximately where the model stops detecting objects when in landscape:
The blue box with red outline is approximately where the detection stops. It behaves strangely, but consistently does not find any objects outside this approbate view / near the left or right edge. However, the top and bottom edges near the center detect without any issue.
regionOfInterest
I have adjusted this to be the maximum: x: 0, y: 0, width: 1, height: 1
. This made no difference
imageCropAndScaleOption
This is the only setting that allows detection in the full screen, however, the performance became noticeably worse, and that’s not really an allowable con.
Is there a scale / size setting somewhere in this process that I have not set properly? Or perhaps a mode I am not using. Any help would be most appreciated. Below is my detection controller:
ViewController.swift
// All unchanged from the download in Apples folder
" "
session.sessionPreset = .hd1920x1080 // Model image size is smaller.
...
previewLayer.connection?.videoOrientation = .landscapeRight
" "
VisionObjectRecognitionViewController
@discardableResult
func setupVision() -> NSError? {
// Setup Vision parts
let error: NSError! = nil
guard let modelURL = Bundle.main.url(forResource: "ObjectDetector", withExtension: "mlmodelc") else {
return NSError(domain: "VisionObjectRecognitionViewController", code: -1, userInfo: [NSLocalizedDescriptionKey: "Model file is missing"])
}
do {
let visionModel = try VNCoreMLModel(for: MLModel(contentsOf: modelURL))
let objectRecognition = VNCoreMLRequest(model: visionModel, completionHandler: { (request, error) in
DispatchQueue.main.async(execute: {
// perform all the UI updates on the main queue
if let results = request.results {
self.drawVisionRequestResults(results)
}
})
})
// These are the only properties that impact the detection area
objectRecognition.regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1)
objectRecognition.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFit
self.requests = [objectRecognition]
} catch let error as NSError {
print("Model loading went wrong: (error)")
}
return error
}
EDIT:
When running the project in portrait mode only (locked by selecting only Portrait in Targets -> General), then rotating the device to landscape, the detection occurs perfectly across the entire screen.
2
Answers
The issue seemed to reside in the rotation of the physical device.
When telling Vision that the device is “not rotated”, but passing all other elements the current orientation, this allowed for the detection bounds to remain the full screen (as if portrait), but allowing the controller to in fact be landscape.
The bounding Boxes are normalised rect which we get from CoreML bounding box observation which we have convert with due ratio of screen to generate boxes in the image for Words