skip to Main Content

I am trying to extract user silhouette and put it above my images. I was able to make a mask and cut user from rgb image. But the contour is messy.

The question is how I can make the mask more precise (to fit real user). I’ve tried ERODE-DILATE filters, but they don’t do much. Maybe I need some Feather filter like in Photoshop. Or I don’t know.

Here is my code.

import SimpleOpenNI.*;
SimpleOpenNI  context;
PImage mask;
void setup()
{
  size(640*2, 480);
  context = new SimpleOpenNI(this);
  if (context.isInit() == false)
  {        
    exit();
    return;
  }
  context.enableDepth(); 
  context.enableRGB();
  context.enableUser();
  context.alternativeViewPointDepthToImage();     
}

void draw()
{
  frame.setTitle(int(frameRate) + " fps");     
  context.update();
  int[] userMap = context.userMap();  
  background(0, 0, 0);
  mask = loadImage("black640.jpg");  //just a black image  
  int xSize = context.depthWidth();
  int ySize = context.depthHeight();  
  mask.loadPixels();
  for (int y = 0; y < ySize; y++) {      
    for (int x = 0; x < xSize; x++) {        
      int index = x + y*xSize;     
      if (userMap[index]>0) {  
        mask.pixels[index]=color(255, 255, 255);
      }
    }
  }
  mask.updatePixels();
  image(mask, 0, 0);
  mask.filter(DILATE);  
  mask.filter(DILATE);         
  PImage rgb = context.rgbImage();
  rgb.mask(mask);  
  image(rgb, context.depthWidth() + 10, 0);
}

2

Answers


  1. Chosen as BEST ANSWER

    I've tried built-in erode-dilate-blur in processing. But they are very inefficient. Every time I increment blurAmount in img.filter(BLUR,blurAmount), my FPS decreases by 5 frames. So I decided to try opencv. It is much better in comparison. The result is satisfactory.

    import SimpleOpenNI.*;
    import processing.video.*;
    import gab.opencv.*;
    SimpleOpenNI  context;
    OpenCV opencv;
    PImage mask;
    int numPixels = 640*480;
    int dilateAmt = 1;
    int erodeAmt = 1;
    int blurAmt = 1;
    Movie mov;
    void setup(){
      opencv = new OpenCV(this, 640, 480);
      size(640*2, 480);
      context = new SimpleOpenNI(this);
      if (context.isInit() == false) {        
        exit();
        return;
      }
      context.enableDepth(); 
      context.enableRGB();
      context.enableUser();
      context.alternativeViewPointDepthToImage();  
      mask = createImage(640, 480, RGB);
      mov = new Movie(this, "wild.mp4");
      mov.play();
      mov.speed(5);
      mov.volume(0);
    }
    void movieEvent(Movie m) {
      m.read();
    }
    void draw() {
      frame.setTitle(int(frameRate) + " fps");     
      context.update();
      int[] userMap = context.userMap();  
      background(0, 0, 0); 
      mask.loadPixels();  
      for (int i = 0; i < numPixels; i++) {
        mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
      }
      mask.updatePixels();
      opencv.loadImage(mask);
      opencv.gray(); 
      for (int i = 0; i < erodeAmt; i++) {
        opencv.erode();
      }
      for (int i = 0; i < dilateAmt; i++) {
        opencv.dilate();
      }  
      if (blurAmt>0) {//blur with 0 amount causes error
        opencv.blur(blurAmt);
      }  
      mask = opencv.getSnapshot();  
      image(mask, 0, 0);
      PImage rgb = context.rgbImage();  
      rgb.mask(mask);  
      image(mov, context.depthWidth() + 10, 0);
      image(rgb, context.depthWidth() + 10, 0);
      fill(255);
      text("erodeAmt: " + erodeAmt + "tdilateAmt: " + dilateAmt + "tblurAmt: " + blurAmt, 15, 15);
    }
    void keyPressed() {
      if (key == 'e') erodeAmt--;
      if (key == 'E') erodeAmt++;
      if (key == 'd') dilateAmt--;
      if (key == 'D') dilateAmt++;
      if (key == 'b') blurAmt--;
      if (key == 'B') blurAmt++;
      //constrain values
      if (erodeAmt < 0) erodeAmt = 0;
      if (dilateAmt < 0) dilateAmt = 0;
      if (blurAmt < 0) blurAmt = 0;
    }
    

  2. It’s good you’re aligning the RGB and depth streams.
    There are few things that could be improved in terms of efficiency:

    No need to reload a black image every single frame (in the draw() loop) since you’re modifying all the pixels anyway:

    mask = loadImage("black640.jpg");  //just a black image
    

    Also, since you don’t need the x,y coordinates as you loop through the user data, you can use a single for loop which should be a bit faster:

      for(int i = 0 ; i < numPixels ; i++){
        mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
      }
    

    instead of:

    for (int y = 0; y < ySize; y++) {      
        for (int x = 0; x < xSize; x++) {        
          int index = x + y*xSize;     
          if (userMap[index]>0) {  
            mask.pixels[index]=color(255, 255, 255);
          }
        }
      }
    

    Another hacky thing you could do is retrieve the userImage() from SimpleOpenNI, instead of the userData() and apply a THRESHOLD filter to it, which in theory should give you the same result as above.

    For example:

    int[] userMap = context.userMap();  
      background(0, 0, 0);
      mask = loadImage("black640.jpg");  //just a black image  
      int xSize = context.depthWidth();
      int ySize = context.depthHeight();  
      mask.loadPixels();
      for (int y = 0; y < ySize; y++) {      
        for (int x = 0; x < xSize; x++) {        
          int index = x + y*xSize;     
          if (userMap[index]>0) {  
            mask.pixels[index]=color(255, 255, 255);
          }
        }
      }
    

    could be:

    mask = context.userImage();
    mask.filter(THRESHOLD);
    

    In terms of filtering, if you want to shrink the silhouette you should ERODE and bluring should give you a bit of that Photoshop like feathering.

    Note that some filter() calls take arguments (like BLUR), but others don’t like the ERODE/DILATE morphological filters, but you can still roll your own loops to deal with that.

    I also recommend having some sort of easy to tweak interface (it can be fancy slider or a simple keyboard shortcut) when playing with filters.

    Here’s a rough attempt at the refactored sketch with the above comments:

    import SimpleOpenNI.*;
    SimpleOpenNI  context;
    PImage mask;
    int numPixels = 640*480;
    
    int dilateAmt = 1;
    int erodeAmt = 1;
    int blurAmt = 0;
    void setup()
    {
      size(640*2, 480);
      context = new SimpleOpenNI(this);
    
      if (context.isInit() == false)
      {        
        exit();
        return;
      }
      context.enableDepth(); 
      context.enableRGB();
      context.enableUser();
      context.alternativeViewPointDepthToImage();  
      mask = createImage(640,480,RGB);  
    }
    
    void draw()
    {
      frame.setTitle(int(frameRate) + " fps");     
      context.update();
      int[] userMap = context.userMap();  
      background(0, 0, 0);
    
      //you don't need to keep reloading the image every single frame since you're updating all the pixels bellow anyway
    //  mask = loadImage("black640.jpg");  //just a black image  
    
    //  mask.loadPixels();
    
    //  int xSize = context.depthWidth();
    //  int ySize = context.depthHeight();  
    //  for (int y = 0; y < ySize; y++) {      
    //    for (int x = 0; x < xSize; x++) {        
    //      int index = x + y*xSize;     
    //      if (userMap[index]>0) {  
    //        mask.pixels[index]=color(255, 255, 255);
    //      }
    //    }
    //  }
    
      //a single loop is usually faster than a nested loop and you don't need the x,y coordinates anyway
      for(int i = 0 ; i < numPixels ; i++){
        mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
      }
      //erode
      for(int i = 0 ; i < erodeAmt ; i++) mask.filter(ERODE);
      //dilate 
      for(int i = 0 ; i < dilateAmt; i++) mask.filter(DILATE);
      //blur  
      mask.filter(BLUR,blurAmt);
    
      mask.updatePixels();
      //preview the mask after you process it  
      image(mask, 0, 0);
    
      PImage rgb = context.rgbImage();
      rgb.mask(mask);  
      image(rgb, context.depthWidth() + 10, 0);
    
      //print filter values for debugging purposes
      fill(255);
      text("erodeAmt: " + erodeAmt + "tdilateAmt: " + dilateAmt + "tblurAmt: " + blurAmt,15,15);
    }
    void keyPressed(){
      if(key == 'e') erodeAmt--;
      if(key == 'E') erodeAmt++;
      if(key == 'd') dilateAmt--;
      if(key == 'D') dilateAmt++;
      if(key == 'b') blurAmt--;
      if(key == 'B') blurAmt++;
      //constrain values
      if(erodeAmt < 0) erodeAmt = 0;
      if(dilateAmt < 0) dilateAmt = 0;
      if(blurAmt < 0) blurAmt = 0;
    }
    

    Unfortunately I can’t test with an actual sensor right now, so please use the concepts explained, but bare in mind the full sketch code isn’t tested.

    This above sketch (if it runs) should allow you to use keys to control the filter parameters (e/E to decrease/increase erosion, d/D for dilation, b/B for blur). Hopefully you’ll get satisfactory results.

    When working with SimpleOpenNI in general I advise recording an .oni file (check out the RecorderPlay example for that) of a person for the most common use case. This will save you some time on the long run when testing and will allow you to work remotely with the sensor detached. One thing to bare in mind, the depth resolution is reduced to half on recordings (but using a usingRecording boolean flag should keep things safe)

    The last and probably most important point is about the quality of the end result. Your resulting image can’t be that much better if the source image isn’t easy to work with to begin with. The depth data from the original Kinect sensor isn’t great. The Asus sensors feel a wee bit more stable, but still the difference is negligible in most cases. If you are going to stick to one of these sensors, make sure you’ve got a clear background and decent lighting (without too much direct warm light (sunlight, incandescent lightbulbs, etc.) since they may interfere with the sensor)

    If you want a more accurate user cut and the above filtering doesn’t get the results you’re after, consider switching to a better sensor like KinectV2. The depth quality is much better and the sensor is less susceptible to direct warm light. This may mean you need to use Windows (I see there’s a KinectPV2 wrapper available) or OpenFrameworks(c++ collections of libraries similar to Processing) with ofxKinectV2

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search