My goal is to implement a tool similar to photoshop’s magic wand to select connected areas with the same color in a picture. A start point within the desired area is known. My first approach was to look at each individual pixel, if the color matches I remember this position in a set and start to look at the surrounding neighbors. To not do the same task multiple times I keep track of the places I already searched.
The set “objectsAlreadyLookedAt” starts to grow rapidly and profiling revealed that the set.subtract method accounts for 91% of the time spend in this function. Which data structure can I use to increase performance on contain and insertion?
var objectsToLookAt = Set<CustomPoint>() //All objects we still have to process
var objectsAlreadyLookedAt = Set<CustomPoint>() // a list of pixels we already visited
var objectMatch = Set<CustomPoint>() //pixels matched
while(objectsToLookAt.count > 0){ //While we have points left to look at
let pointToLookAt:CustomPoint = objectsToLookAt.popFirst()! //Randomly take one
objectsAlreadyLookedAt.insert(pointToLookAt) //Remember that we already visited this node
offset = pointToLookAt.y * width + pointToLookAt.x //Calculate the index for the data buffer
if(pixelBuffer[offset]==needleColor){
objectMatch.insert(pointToLookAt) //Remember match for later
//Generate 8 surrounding candidates
var neighboorPoints: Set<CustomPoint> = createSurroundingPoints(startPoint: pointToLookAt)
//Remove all points we have already looked at from the set. BOTTLENECK!
neighboorPoints.subtract(objectsAlreadyLookedAt)
objectsToLookAt = objectsToLookAt.union(neighboorPoints)
}
}
…
//Hashable point class
class CustomPoint : Hashable, CustomStringConvertible{
var x:Int
var y:Int
init(x: Int, y: Int) {
self.x = x
self.y = y
}
//In our case coordinates will never exeed 10k
var hashValue: Int {
return y + x*10000
}
static func ==(lhs: CustomPoint, rhs: CustomPoint) -> Bool {
if(lhs.x == rhs.x && lhs.y == rhs.y){
return true
}
return false
}
public var description: String { return "Point: x:(x) y:(y)" }
}
Alternatively
-
If I only had non concave polygons I could divide my search area in 4 different directions and simply let the algorithm walk down, without the fear of repetition. But this is not the case.
-
Would I be better of using a scanline algorithm, looking at every pixel, creating a path along the edges and using this? I don’t see why I should take a look at the entire image if I am only interested in a small part and can grow outwards from a known point. Also this might get complicated rather quickly.
2
Answers
I suggest using Dictionary instead of Set to check whether a certain CustomPoint object is already processed. The key in the dictionary is x*100000 + y (since you said “In our case coordinates will never exeed 10k“).
Total changes
The very algorithm can be greatly speeded up if you will work with two structures – the whole field of pixels AND border set. Pixel can be in 4 states: unchecked, accepted, unaccepted, border.
Of course, you should add checks for being out of the field limits.
And, sure, you know how to loop through neighbours quickly? Feel free to use the algorithm from https://codereview.stackexchange.com/a/8947/11012, the 3rd paragraph.
Lesser changes
If you don’t like the whole idea, at least notice, that substracting elements from border set is totally useless, for every next great cycle of checking you have totally fresh border set. Pixels that belonged to the border the last cycle, don’t belong to it this cycle. Simply create a new set every cycle and use it for the next cycle.