Real-Time Instance Segmentation With YOLOV8M-seg And Savant Framework

Instance segmentation is an important task in the computer vision field. High-quality instance segmentation couldn’t run in real-time on more than new hardware for a long time. However, recent advances in the CV field have made it possible to run instance segmentation in real-time. The YOLOV8 family, invented and published by Ultralitics, broke through the next frontier of computer vision, enabling object segmenting efficiently. It opens doors for a broad range of applications and increases the quality of CV, which is more important.

More and more researchers claim that bounding boxes represent an unnatural approach to object recognition. They motivate people in the CV industry to move to more advanced models that can perceive objects as they are and even recognize complex actions based on spatiotemporal analysis. In this respect, real-time instance segmentation is essential for novel CV applications.

According to Ultralytics, YOLOV8M-seg can run at 450 FPS on Nvidia A100. We don’t have A100 in our lab because we focus more on inference than training, but we can confirm it can reach 100 FPS on Nvidia DeepStream and Nvidia RTX 4000/A4000. With postprocessing, mask scaling, and drawing, we built a demo reaching 55 FPS on i5–6400 and RTX4000, which we believe is a fantastic result for segmentation, more to add the bottleneck is CPU (utilized 4 x 100%), not GPU (utilized 50%). The demo is created with the Savant framework.

Take a look at the following videos to find out what the demo looks like:

Another important note regarding the demo is that it is crafted with only Python code without the use of low-level programming. However, we use Numba’s “nogil” to eliminate Python’s GIL, as Savant can efficiently utilize multiple computation threads.

What is Savant

Savant is an open-source, high-level framework for building real-time, streaming, highly efficient multimedia AI applications on the Nvidia stack. It helps develop dynamic, fault-tolerant inference pipelines that utilize the best Nvidia data center and edge technologies.

Savant is built on DeepStream and provides a high-level abstraction layer for building inference pipelines. It is designed to be easy to use, flexible, and scalable. It is an excellent choice for building advanced CV and video analytics applications for cities, retail, manufacturing, and more.

The Demo

The demo is published in the Savant repository on GitHub:

Along with the demo, you can find other samples covering how to use various models and build pipelines with Savant:

Consider visiting our GitHub, and don’t forget to subscribe to our Twitter (X) to receive updates on Savant. Also, we have Discord, where we help users to onboard.