Credit to Eric VanBuhler for contributing the code corresponding to overlay_image and mask dilation.
In a previous blog post, I showed how to separate a person from a video stream and alter the background, creating a virtual green screen. In that post, the model that performed best was a coarse-grained semantic segmentation model that resulted in large, blocky segmentation edges. A more fine-grained segmentation model was not able to accurately track the person in the video stream, and using Gaussian smoothing on the more coarse-grained model blurred the entire image. In this tutorial, we’ll cover how to smooth out edges generated by coarse-grained semantic segmentation models without blurring the desired target objects.
To complete the tutorial, you must have:
- An alwaysAI account (it’s free!)
- alwaysAI set up on your machine (also free)
- A text editor such as sublime or an IDE such as PyCharm, both of which offer free versions, or whatever else you prefer to code in
All of the code from this tutorial is available on GitHub.
Let’s get started!
- First we’ll build a mask that detects persons in the frame. To build the color mask, we’ll first change all colors in the semantic segmentation object to black. Then, we will only change the indices that correspond to labels we want to identify to white. Copy the following code and paste it underneath the lines that append the model time to the text variable:
# build the color mask, making all colors the same except for background
semantic_segmentation.colors = [ (0,0,0) for i in semantic_segmentation.colors]
# iterate over all the desired items to identify, labeling those white
for label in labels_to_mask:
index = semantic_segmentation.labels.index(label)
semantic_segmentation.colors[index] = (255,255,255)
# build the color mask
mask = semantic_segmentation.build_image_mask(results.class_map)
2. Next, we enlarge the area corresponding to the detected persons by dilating the mask. We do this because the bounding edges of the detected person typically tend to be inside the person. By enlarging this area, we won’t cut off any of the detected person in the subsequent steps. We use the OpenCV library to dilate the mask, using a cross-shaped dilation type. Copy the following code and paste it beneath the previous step’s changes:
# Enlarge the mask
dilatation_size = 15
# Options: cv.MORPH_RECT, cv.MORPH_CROSS, cv.MORPH_ELLIPSE
dilatation_type = cv.MORPH_CROSS
element = cv.getStructuringElement(dilatation_type,(2*dilatation_size + 1, 2*dilatation_size+1),(dilatation_size, dilatation_size))
mask = cv.dilate(mask, element)
NOTE: you can change the dilation size to customize how much room to leave around the detected person. For more details on OpenCV methods, see documentation for cvtColor, getStructuringElement, and dilate.
3. Then, we apply smoothing to the entire mask, using the ‘blur’ method in the OpenCV library. Copy the following code and paste it just beneath the code from the previous step:
# apply smoothing to the mask
mask = cv.blur(mask, (blur_level, blur_level))
4. The portion of the code that generates the background is largely the same as it is in the original example app. The only difference we’ll make occurs directly after the end of the ‘else’ statement. Replace the two lines after the ‘else’ block that update ‘background’ and send data to the streamer with the following two lines:
frame = overlay_image(frame, background, mask)
5. Finally, we’ll generate a reciprocal mask for the background before combining the original mask and the reciprocal background mask into a new image. In the previous step, we made a call to a method called ‘overlay_image’. This method does not exist in our code yet, so copy the following code and place it above the declaration of ‘main()’ at the top of app.py:
def overlay_image(foreground_image, background_image, foreground_mask):
background_mask = cv.cvtColor(255 — cv.cvtColor(foreground_mask, cv.COLOR_BGR2GRAY), cv.COLOR_GRAY2BGR)
masked_fg = (foreground_image * (1 / 255.0)) * (foreground_mask * (1 / 255.0))
masked_bg = (background_image * (1 / 255.0)) * (background_mask * (1 / 255.0))
return np.uint8(cv.addWeighted(masked_fg, 255.0, masked_bg, 255.0, 0.0))
This function generates a background mask based off of the original mask, which has detected persons in white (along with any other labels that were added to labels_to_mask) and all other fields in black. The original mask and new background mask are then scaled appropriately. Finally, the two masks are combined using equal weights to form the final image.
Now, to see your app in action, first build the app by typing into the command line:
aai app deploy
And once it is done building, type the following command to start the app:
aai app start
Now open any browser to ‘localhost:5000’ to see your virtual green screen in action, now with a smoother definition around the detected individuals.