I am using Vision to detect objects and after getting [VNRecognizedObjectObservation]
I transform the normalized rects before showing them:
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(height))
VNImageRectForNormalizedRect(normalizedRect, width, height) // Displayed with SwiftUI, that's why I'm applying transform
.applying(transform)
The width and height are from SwiftUI GeometryReader:
Image(...)
.resizable()
.scaledToFit()
.overlay {
GeometryReader { geometry in // ZStack and ForEach([VNRecognizedObjectObservation], id: .uuid), then:
let calculatedRect = calculateRect(boundingBox, geometry)
Rectangle()
.frame(width: calculatedRect.width, height: calculatedRect.height)
.offset(x: calculatedRect.origin.x, y: calculatedRect.origin.y)
}
}
But the problem is many boxes are positioned incorrectly (while some are accurate) even on on square images.
This is not related to model because the same images (using same MLModel) have pretty accurate BBs when I try them in Xcode Model Preview section.
Sample Image in my App:
Sample Image in Xcode Preview:
Update (Minimal Reproducible Example):
Having this code inside ContentView.swift
as a macOS SwiftUI
project while having YOLOv3Tiny.mlmodel in project bundle will produce the same results.
import SwiftUI
import Vision
import CoreML
class Detection: ObservableObject {
let imgURL = URL(string: "https://i.imgur.com/EqsxxTc.jpg")! // Xcode preview generates this: https://i.imgur.com/6IPNQ8b.png
@Published var objects: [VNRecognizedObjectObservation] = []
func getModel() -> VNCoreMLModel? {
if let modelURL = Bundle.main.url(forResource: "YOLOv3Tiny", withExtension: "mlmodelc") {
if let mlModel = try? MLModel(contentsOf: modelURL, configuration: MLModelConfiguration()) {
return try? VNCoreMLModel(for: mlModel)
}
}
return nil
}
func detect() async {
guard let model = getModel(), let tiff = NSImage(contentsOf: imgURL)?.tiffRepresentation else {
fatalError("Either YOLOv3Tiny.mlmodel is not in project bundle, or image failed to load.")
// YOLOv3Tiny: https://ml-assets.apple.com/coreml/models/Image/ObjectDetection/YOLOv3Tiny/YOLOv3Tiny.mlmodel
}
let request = VNCoreMLRequest(model: model) { (request, error) in
DispatchQueue.main.async {
self.objects = (request.results as? [VNRecognizedObjectObservation]) ?? []
}
}
try? VNImageRequestHandler(data: tiff).perform([request])
}
func deNormalize(_ rect: CGRect, _ geometry: GeometryProxy) -> CGRect {
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(geometry.size.height))
return VNImageRectForNormalizedRect(rect, Int(geometry.size.width), Int(geometry.size.height)).applying(transform)
}
}
struct ContentView: View {
@StateObject var detection = Detection()
var body: some View {
AsyncImage(url: detection.imgURL) { img in
img.resizable().scaledToFit().overlay {
GeometryReader { geometry in
ZStack {
ForEach(detection.objects, id: .uuid) { object in
let rect = detection.deNormalize(object.boundingBox, geometry)
Rectangle()
.stroke(lineWidth: 2)
.foregroundColor(.red)
.frame(width: rect.width, height: rect.height)
.offset(x: rect.origin.x, y: rect.origin.y)
}
}
}
}
} placeholder: {
ProgressView()
}
.onAppear {
Task { await self.detection.detect() }
}
}
}
Edit: further testing revealed that VN returns correct positions, and my deNormalize()
function also return correct positions and size so it has to be related to SwiftUI.
2
Answers
Okay so after a long time of troubleshooting, I finally managed to make it work correctly (while still not understanding the reason for the problem)...
The problem was this part:
I assumed because many
Rectangle()
s will overlap, I need aZStack()
to put them over each other, this turned out to be wrong, apparently when using.offset()
they can overlap without any issue, so removing theZStack()
completely solved the problem:What I still don't understand, is why moving the
ZStack()
outsideGeometryReader()
also solves the problem and why some boxes were in the correct positions while some were not!Issue 1
GeometryReader
makes everything inside shrink to its smallest size.Add
.border(Color.orange)
to theZStack
and you will see something like what I have below.You can use
.frame(maxWidth: .infinity, maxHeight: .infinity)
to make theZStack
stretch to take all the available space.Issue 2
position
vsoffset
.offset
usually starts at the center then youoffset
by the specified amount.position
is more likeorigin
.Issue 3
Adjusting for that center positioning vs top left (0, 0) that is used by origin.
Issue 4
The
ZStack
needs to be flipped on the X axis.Below is the full code
I also changed some other things as you can notice, there are comments in the code.